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Foreword 


Until recently, finance theory appeared to be reaching a triumphant climax. Many 
years ago, Harry Markowitz and William Sharpe had shown how diversification 
could reduce risk. In 1973, Fischer Black, Myron Scholes and Robert C. Merton 
went further by conjuring away risk completely, using the magic trick of dynamic 
replication. Twenty-five years later, a multi-trillion dollar derivatives industry had 
grown up around these insights. And of these five founding fathers, only Black 
missed out on a Nobel prize due to his tragic early death. Black, Scholes and 
Merton’s option pricing breakthrough depended on the idea that hungry arbitrage 
traders were constantly prowling the markets, forcing prices to match theoretical 
predictions. The hedge fund Long-Term Capital Management-—which included 
Scholes and Merton as partners—was founded with this principle at its core, So 
strong was LTCM’s faith in these theories that it used leverage to make enormous 
bets on small discrepancies from the predictions of finance theory. We all know 
what happened next. In August and September 1998, the fund lost $4.5 billion, 
roughly 90% of its value, and had to be bailed out by its 14 biggest counterparties. 
Global markets were severely disrupted for several months. All the shibboleths of 
finance theory, in particular diversification and replication, proved to be false gods, 
and the reputation of quants suffered badly as a result. Traditionally, finance texts 
take these shibboleths as a starting point, and build on them. Empirical verification 
is given scant attention, and the consequences of violating the key assumptions 
are often ignored completely. The. result is a culture where markets get blamed 
if the theory breaks down, rather than vice versa, as it should be. Unsurprisingly, 
traders accuse some quants of having an ivory-tower mentality. Now, here come 
Bouchaud and Potters. Without eschewing rigour, they approach finance theory 
with a sceptical eye. All the familiar results —efficient portfolios, Black-Scholes 
and so on~are here, but with a strongly empirical flavour. There are also some 
useful additions to the existing toolkit, such as random matrix theory. Perhaps 
one day, theorists will show that the exact Black-Scholes regime is an unstable, 
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pathological state rather than the utopia it was formerly thought to be. Until then, 
quants will find this book a useful survival guide in the real world. 


Nick Dunbar 
Technical Editor, Risk Magazine 
Author of Inventing Money (John Wiley and Sons, 2000) 
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Preface 


Finance is a rapidly expanding field of science, with a rather unique link to 
applications. Correspondingly, recent years have witnessed the growing role of 
financial engineering in market rooms. The possibility of easily accessing and 
processing huge quantities of data on financial markets opens the path to new 
methodologies, where systematic comparison between theories and real data not 
only becomes possible, but mandatory. This perspective has spurred the interest of 
the statistical physics community, with the hope that methods and ideas developed 
in the past decades to deal with complex systems could also be relevant in finance. 
Correspondingly, many holders of PhDs in physics are now taking jobs in banks or 
other financial institutions. 

However, the existing literature roughly falls into two categories: either rather 
abstract books from the mathematical finance community, which are very difficult 
for people trained in natural sciences to read, or more professional books, where the 
scientific level is usually quite poor.' In particular, there is in this context no book 
discussing the physicists’ way of approaching scientific problems, in particular a 
systematic comparison between ‘theory’ and ‘experiments’ (i.e. empirical results), 
the art of approximations and the use of intuition. Moreover, even in excellent 
books on the subject, such as the one by J. C. Hull, the point of view on derivatives 
is the traditional one of Black and Scholes, where the whole pricing methodology 
is based on the construction of riskless strategies. The idea of zero risk is counter- 
intuitive and the reason for the existence of these riskless strategies in the Black— 
Scholes theory is buried in the premises of Ito’s stochastic differential rules. 

It is our belief that a more intuitive understanding of these theories is needed 
for. a better overall control of financial risks. The models discussed in Theory of 


' ‘There are notable exceptions, such as the remarkable book by J. C, Hull, Futures, Options and Other 
Derivatives, Prentice Hall, 1997, 

2 See however: I. Kondor, J. Kertesz (Eds): Econophysics, an Emerging Science, Kluwer, Dordrecht (1999): R. 
Mantegna and H. E. Stanley. An introduction to Econaphysics, Cambridge University Press (1999). 
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Financial Risk are devised to account for real markets’ statistics where the con- 
struction of riskless hedges is in general impossible. The mathematical framework 
required to deal with these cases is however not more complicated, and has the 
advantage of making the issues at stake, in particular the problem of risk, more 
transparent. 

Finally, commercial software packages are being developed to measure and 
control financial risks (some following the ideas developed in this book).* We hope 
that this book can be useful to all people concerned with financial risk control, by 
discussing at length the advantages and limitations of various statistical models. 

Despite our efforts to remain simple, certain sections are still quite technical. 
We have used a smaller font to develop more advanced ideas, which are not crucial 
to understanding of the main ideas. Whole sections, marked by a star (*), contain 
rather specialized material and can be skipped at first reading. We have tried to be as 
precise as possible, but have sometimes been somewhat sloppy and non-rigorous. 
For example, the idea of probability is not axiomatized: its intuitive meaning is 
more than enough for the purpose of this book. The notation P(-) means the 
probability distribution for the variable which appears between the parentheses, and 
not a well-determined function of a dummy variable. The notation x —> oo does 
not necessarily mean that x tends to infinity in a mathematical Sense, but rather that 
x is large. Instead of trying to derive results which hold true in any circumstances, 
we often compare order of magnitudes of the different effects: small effects are 
neglected, or included perturbatively.4 

Finally, we have not tried to be comprehensive, and have left out a number of 
important aspects of theoretical finance. For example, the problem of interest rate 
derivatives (swaps, caps, swaptions...) is not addressed ~ we feel that the present 
models of interest rate dynamics are not Satisfactory (see the discussion in Section 
2.0). Correspondingly, we have not tried to give an exhaustive list of references, but 
rather to present our own way of understanding the subject. A certain number of 
important references are given at the end of each chapter, while more specialized 
Papers are given as footnotes where we have found it necessary. 

This book is divided into five chapters. Chapter | deals with important results 
in probability theory (the Central Limit Theorem and its limitations, the theory. of 
extreme value statistics, etc.). The statistical analysis of real data, and the empirical 
determination of the statistical laws, are discussed in Chapter 2. Chapter 3 is 
concerned with the definition of risk, value-at-risk, and the theory of optimal 


3 For example. the software Profiler. commercialized by the company ATSM, heavily relies on the concepts 
introduced in Chapter 3. 
@ = b means that a is of order b. a < 6 means that a is smaller than, say, 6/10. A computation neglecting 
terms of order (a/b)? js therefore accurate to 1%. Such a precision is usually enough in the financial context, 
where the uncertainty on the value of the parameters (such as the average return, the volatility. etc.), is often 
larger than 1%, 
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portfolio. in particular in the case where the probability of extreme risks has to be 
minimized. The problem of forward contracts and options, their optimal hedge and 
the residual risk is discussed in detail in Chapter 4. Finally. some more advanced 
topics on options are introduced in Chapter 5 (such as exotic options, or the role of 
transaction costs). Finally, a short glossary of financial terms, an index and a list of 
symbols are given at the end of the book, allowing one to find easily where each 
symbol or word was used and defined for the first time. 

This book appeared in its first edition in French, under the title: 7) héorie des 
Risques Financiers, Aléa-Saclay-Eyrolles, Paris (1997). Compared to this first 
edition, the present version has been substantially improved and augmented. For 
example, we discuss the theory of random matrices and the problem of the interest 
rate curve, which were absent from the first edition. Furthermore, several points 
have been corrected or clarified. 
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Probability theory: basic notions 


All epistemologic value of the theory of probability is based on this: that large scale 
random phenomena in their collective action create strict, non random regularity. 


(Gnedenko and Kolmogorov, Limit Distributions for Sums of Independent Random 
Variables.) 


1.1 Introduction 


Randomness stems from our incomplete knowledge of reality, from the lack of 
information which forbids a perfect prediction of the future. Randomness arises 
from complexity, from the fact that causes are diverse, that tiny perturbations 
may result in large effects. For over a century now, Science has abandoned 
Laplace’s deterministic vision, and has fully accepted the task of deciphering 


_ randomness and inventing adequate tools for its description. The surprise is that, 


after all, randomness has many facets and that there are many levels to uncertainty, 
but, above all, that a new form of predictability appears, which is no longer 
deterministic but statistical. 

Financial markets offer an ideal testing ground for these statistical ideas. 
The fact that a large number of participants, with divergent anticipations and 
conflicting interests, are simultaneously present in these markets, leads to an 
unpredictable behaviour. Moreover, financial markets are (sometimes strongly) 
affected by external news— which are, both in date and in nature, to a large degree 
unexpected. The statistical approach consists in drawing from past observations 
some information on the frequency of possible price changes. If one then assumes 
that these frequencies reflect some intimate mechanism of the markets themselves, 
then one may hope that these frequencies will remain stable in the course of 
time. For example, the mechanism underlying the roulette or the game of dice 
is obviously always the same, and one expects that the frequency of all possible 
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outcomes will be invariant in time — although of course each individual outcome is 
random. 

This ‘bet’ that probabilities are stable (or better, stationary) is very reasonable 
in the case of roulette or dice; it is nevertheless much less justified in the case 
of financial markets — despite the large number of participants which confer to the 
system a certain regularity, at least in the sense of Gnedenko and Kolmogorov. 
It is clear, for example, that financial markets do not behave now as they did 30 
years ago: many factors contribute to the evolution of the way markets behave 
(development of derivative markets, world-wide and computer-aided trading, etc.). 
As will be mentioned in the following, ‘young’ markets (such as emergent 
countries markets) and more mature markets (exchange rate markets, interest rate 
markets, etc.) behave quite differently. The statistical approach to financial markets 
is based on the idea that whatever evolution takes place, this happens sufficiently 
slowly (on the scale of several years) so that the observation of the recent past 
is useful to describe a not too distant future. However, even this ‘weak stability’ 
hypothesis is sometimes badly in error, in particular in the case of a crisis, which 
marks a sudden change of market behaviour. The recent example of some Asian 
currencies indexed to the dollar (such as the Korean won or the Thai baht) is 
interesting, since the observation of past fluctuations is clearly of no help to predict 
the amplitude of the sudden turmoil of 1997, see Figure 1.1. 

Hence, the statistical description of financial fluctuations is certainly imperfect. 
It is nevertheless extremely helpful: in practice, the ‘weak stability’ hypothesis is 
in most cases reasonable, at least to describe risks.? 

In other words, the amplitude of the possible price changes (but not their sign!) 
is, to a Certain extent, predictable. It is thus rather important to devise adequate 
tools, in order to control (if at all possible) financial risks. The goal of this first 
chapter is to present a certain number of basic notions in probability theory, which 
we shall find useful in the following. Our presentation does not aim at mathematical 
rigour, but rather tries to present the key concepts in an intuitive way, in order to 
ease their empirical use in practical applications. 


1.2 Probabilities 
1.2.1 Probability distributions 


Contrarily to the throw of a dice, which can only return an integer between 1 
and 6, the variation of price of a financial asset? can be arbitrary (we disregard 


' The idea that science ultimately amounts to making the best possible guess of reality is due to R. P. Feynman 
(Seeking New Laws, in The Character of Physical Laws, MIT Press, Cambridge. MA, 1965). 
2 The prediction of future returns on the basis of past returns is however much less justified. 


3 Asset is the generic name for a financial instrument which can be bought or sold, like stocks. currencies, gold, 
bonds, etc. 
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Fig. 1.1. Three examples of statistically unforeseen crashes: the Korean won against the 
dollar in 1997 (top), the British 3-month short-term interest rates futures in 1992 (middle), 
and the S&P 500 in 1987 (bottom). In the example of the Korean won, it is particularly 
clear that the distribution of price changes before the crisis was extremely narrow, and 
could not be extrapolated to anticipate what happened in the crisis period. 


the fact that price changes cannot actually be smaller than a certain quantity —a 
‘tick’). In order to describe a random process X for which the result is a real 
number, one uses a probability density P(x), such that the probability that X is 
within a small interval of width dx around X = x is equal to P(x) dx. In the 
following, we shall denote as P(-) the probability density for the variable appearing 
as the argument of the function. This is a potentially ambiguous, but very useful 
notation. 
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The probability that X is between a and 6 is given by the integral of P(x) 
between a and b, 


b 

Pa <x <b)= | P(x) dx. (1.1) 
a 

In the following, the notation P(-) means the probability of a given event, defined 

by the content of the parentheses (-). 

The function P(x) is a density; in this sense it depends on the units used to 
measure X. For example, if X is a length measured in centimetres, P(x) is a 
probability density per unit length, i.e. per centimetre. The numerical value of P(x) 
changes if X is measured in inches, but the probability that X lies between two 
specific values /, and /> is of course independent of the chosen unit. P(x) dx is thus 
invariant upon a change of unit, i.e. under the change of variable x — yx. More 
generally, P(x) dx is invariant upon any (monotonic) change of variable x > y(x): 
in this case, one has P(x) dx = P(y) dy. 

In order to be a probability density in the usual sense, P(x) must be non-negative 
(P(x) = 0 for all x) and must be normalized, that is that the integral of P(x) over 
the whole range of possible values for X must be equal to one: 


xM 
/ P(x)dx = 1, (1.2) 
where X,, (resp. X,4) is the smallest value (resp. largest) which X can take. In the 
case where the possible values of X are not bounded from below, one takes x,, = 
—oo, and similarly for x44. One can actually always assume the bounds to be -too 
by setting to zero P(x) in the intervals ]—oo, x,,] and [xy, oof. Later in the text, 
we shall often use the symbol fas a shorthand for f*°°. 

An equivalent way of describing the distribution of X is to consider its cumula- 
tive distribution P(x), defined as: 


x 


P.(x) = P(X < x) =[ P(x’) dx’. (1.3) 


P(x) takes values between zero and one, and is monotonically increasing with 
x. Obviously, P.(—oo) = 0 and P.(+00) = 1. Similarly, one defines P, (x) = 
| — P.(x), si) it 


1.2.2 Typical values and deviations 


It is quite natural to speak about ‘typical’ values of X. There are at least three’ 


mathematical definitions of this intuitive notion: the most probable value, the 
median and the mean. The most probable value x* corresponds to the maximum of 
the function P(x); x* needs not be unique if P(x) has several equivalent maxima. 
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Fig. 1.2. The ‘typical value’ of a random variable X drawn according to a distribution 
density P(x) can be defined in at least three different ways: through its mean value (x), 
its most probable value x* or its median xX;eq. In the general case these three values are 


_ distinct. 


The median xmea is such that the probabilities that X be greater or less than this 
particular value are equal. In other words, P< (Xmea) = Ps (Xmed) = 5. The mean, 
or expected value of X, which we shall note as m or (x) in the following, is the 
average of all possible values of X, weighted by their corresponding probability: 


* m=(x)= [ spear. (1.4) 


For a unimodal distribution (unique maximum), symmetrical around this max- 
imum, these three definitions coincide. However, they are in general different, 
although often rather close to one another. Figure 1.2 shows an example of a 


. non-symmetric distribution, and the relative position of the most probable value, 


the median and the mean. 

One can then describe the fluctuations of the random variable X: if the random 
process is repeated several times, one expects the results to be scattered in a cloud 
of a certain ‘width’ in the region of typical values of X. This width can be described 
by the mean absolute deviation (MAD) E.y;, by the root mean square (RMS) 
o (or, in financial terms, the volatility ), or by the ‘full width at half maximum’ 


W1/2- 
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The mean absolute deviation from a given reference value is the average of the 
distance between the possible values of X and this reference value,* 


Ee = / |x — Xmeal P(x) dx. (1.5) 

Similarly, the variance (o7) is the mean distance squared to the reference value m, 
Di... 2 2 

ao =((x—m))= fo —m)° P(x) dx. (1.6) 


Since the variance has the dimension of x squared, its square root (the RMS, o) 
gives the order of magnitude of the fluctuations around m. ‘ 

Finally, the full width at half maximum w, is defined (for a distribution which 
is symmetrical around its unique maximum x*) such that P(x* + (w)/2)/2) = 
P(x*)/2, which corresponds to the points where the probability density has 
dropped by a factor of two compared to its maximum value. One could actually 
define this width slightly differently, for example such that the total probability to 
find an event outside the interval [(x* — w/2), (x* + w/2)] is equal to, say, 0.1. 

The pair mean-variance is actually much more popular than the pair median 
MAD. This comes from the fact that the absolute value is not an analytic function 
of its argument, and thus does not possess the nice properties of the variance, such 
as additivity under convolution, which we shall discuss below. However, for the 
empirical study of fluctuations, it is sometimes preferable to use the MAD; it is 
more robust than the variance, that is, less sensitive to rare extreme events, which 
may be the source of large statistical errors. 


1,2.3: Moments and characteristic function 


More generally, one can define higher-order moments of the distribution P(x) as 
the average of powers of X: 


m, = (x") = [Pea (1.7) 


Accordingly, the mean mm is the first moment (n = 1), and the variance is related 


to the second moment (2 = m2 — m2). The above definition, Eq. {1.7), ig 


only meaningful if the integral converges, which requires that P(x) decreases 
sufficiently rapidly for large |x| (see below). 

From a theoretical point of view, the moments are interesting: if they exist, their 
knowledge is often equivalent to the knowledge of the distribution P(x) itself.5 In 
* One chooses as a reference value the median for the MAD and the mean for the RMS, because for a fixed 


distribution P(x), these two quantities minimize, respectively, the MAD and the RMS. 


This is not rigorously correct, since one can exhibit examples of different distribution densities which possess 
exactly the same moments. see Section 1.3.2 below. 


3 
t 
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practice however, the high order moments are very hard to determine satisfactorily: 
as n grows, longer and longer time series are needed to keep a certain level of 
precision on m,, these high moments are thus in general not adapted to describe 
empirical data. 

For many computational purposes, it is convenient to introduce the characteristic 
function of P(x), defined as its Fourier transform: 


P(2)= [emrooar, (1.8) 


The function P(x) is itself related to its characteristic function through an inverse 
Fourier transform: 
1 oar 
P(x) = sg | Peas, (1.9) 
. 2n 
Since P(x) is normalized, one always has P(0) = 1, The moments of P(x) can be 
obtained through successive derivatives of the characteristic function at z = 0, 


GO! sx 
m, = (—i)" at @) (1.10) 


z=0 
One finally defines the cumulants c,, of a distribution as the successive derivatives 
of the logarithm of its characteristic function: 


in 

dz" 

The cumulant c, is a polynomial combination of the moments m, with p < n. 
For example cy) = mz — m? = o°®. It is often useful to normalize the cumulants 


by an appropriate power of the variance, such that the resulting quantities are 
dimensionless. One thus defines the normalized cumulants hn, 


log P(z) (1) 


Ch = (—i)" 


2=0 


An =Cn/o". (1.12) 


One often uses the third and fourth normalized cumulants, called the skewness and 
kurtosis (k),° ; 


x—m)} x —m)* 
Lomi au a ee ooo 
The above definition of cumulants may look arbitrary, but these quantities have 
remarkable properties. For example, as we shall show in Section 1.5, the cumulants 
simply add when one sums independent random variables. Moreover a Gaussian 
distribution (or the normal law of Laplace and Gauss) is characterized by the 


fact that all cumulants of order larger than two are identically zero. Hence the 


3: (1.13) 


© Note that it is sometimes « ++ 3, rather than x itself, which is called the kurtosis. 
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cumulants, in particular «, can be interpreted as a measure of the distance between 
a given distribution P(x) and a Gaussian. 


1.2.4 Divergence of moments —asymptotic behaviour 


The moments (or cumulants) of a given distribution do not always exist. A 
necessary condition for the nth moment (71,,) to exist is that the distribution density 
P(x) should decay faster than 1/|x|"*! for |x| going towards infinity, or else the 
integral, Eq. (1.7), would diverge for |x| large. If one only considers distribution 
densities that are behaving asymptotically as a power-law, with an exponent | + jz, 


ia 


MAS 
P(x) ix for x —> 00, (1.14) 


then all the moments such that n > j are infinite. For example, such a distribution 
has no finite variance whenever 2 < 2. [Note that, for P(x) to be a normalizable 
probability distribution, the integral, Eq. (1.2), must converge, which requires 
u> Od] 


The characteristic function of a distribution having an asymptotic power-law behaviour 
given by Eq. (1.14) is non-analytic around z = 0. The small z expansion contains regular 
terms of the form z" for n < j« followed by a non-analytic term |z| (possibly with 
logarithmic corrections such as |z\" log z for integer jx). The derivatives of order larger 
or equal to 2 of the characteristic function thus do not exist at the origin (z = 0). 


1.3 Some useful distributions 
1.3.1 Gaussian distribution 


The most commonly encountered distributions are the ‘normal’ laws of Laplace 
and Gauss, which we shall simply call Gaussian in the following. Gaussians are 
ubiquitous: for example, the number of Aeads in a sequence of a thousand coin 
tosses, the exact number of oxygen molecules in the room, the height (in inches) 
of a randomly selected individual, are all approximately described by a Gaussian 
distribution.’ The ubiquity of the Gaussian can be in part traced to the Central 
Limit Theorem (CLT) discussed at length below, which states that a phenomenon 
resulting from a large number of small independent causes is Gaussian. There 
exists however a large number of cases where the distribution describing a complex 
phenomenon is nor Gaussian: for example, the amplitude of earthquakes, the 
velocity differences in a turbulent fluid, the stresses in granular materials, etc., 
and, as we shall discuss in the next chapter, the price fluctuations of most financial 
assets. 


% Although, in the above three examples, the random variable cannot be negative. As we shall discuss below, the 
Gaussian description is generally only valid in a certain neighbourhood of the maximum of the distribution. 


1.3 Some usefid distributions 9 


A Gaussian of mean m and root mean square @ is defined as: 


Po(x) = a (-“=*). (1.15) 


The median and most probable value are in this case equal to m, whereas the MAD 
(or any other definition of the width) is proportional to the RMS (for example, 
Eps = 0 /2/7). For m = 0, all the odd moments are zero and the even moments 
are given by m2, = (2n — 1)(2n —3)...07" = (2n — I) !!0". 

All the cumulants of order greater than two are zero for a Gaussian. This can be 
realized by examining its characteristic function: 


2.2 
Pg(z) = exp (-= + im: . (1.16) 


Its logarithm is a second-order polynomial, for which all derivatives of order larger 
than two are zero. In particular, the kurtosis of a Gaussian variable is zero. As 
mentioned above, the kurtosis is often taken as a measure of the distance from a 
Gaussian distribution. When « > 0 (leptokurtic distributions), the corresponding 
distribution density has a marked peak around the mean, and rather ‘thick’ tails. 
Conversely, when x < 0, the distribution density has a flat top and very thin tails. 
For example, the uniform distribution over a certain interval (for which tails are 
absent) has a kurtosis « = —§. 

A Gaussian variable is peculiar because ‘large deviations’ are extremely rare. 
The quantity exp(—x7/207) decays so fast for large x that deviations of a few times 
o are nearly impossible. For example, a Gaussian variable departs from its most 
probable value by more than 20 only 5% of the times, of more than 30 in 0.2% of 
the times, whereas a fluctuation of 100 has a probability of less than 2 x 107°; in 
other words, it never happens. 


1.3.2 Log-normal distribution 


Another very popular distribution in mathematical finance is the so-called ‘log- 
normal’ law. That X is a log-normai random variable simply means that log X 
is normal, or Gaussian. Its use in finance comes from the assumption that the 
rate of returns, rather than the absolute change of prices, are independent random 
variables. The increments of the logarithm of the price thus asymptotically sum 
to a Gaussian, according to the CLT detailed below. The log-normal distribution 
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density is thus defined as:8 


l log2(x/x 
Pin (x) = ——== exp _ log’ @/x0) 


: (1.17) 
xv 2007 20° 


the moments of which being: m,, = x%e""?"/?. 

In the context of mathematical finance, one often prefers log-normal to Gaussian 
distributions for several reasons. As mentioned above, the existence of a random 
rate of return, or random interest rate, naturally leads to log-normal statistics. 
Furthermore, log-normals account for the following symmetry in the problem 
of exchange rates: if x is the rate of currency A in terms of currency B, then 
obviously, 1/x is the rate of currency B in terms of A. Under this transformation, 
logx becomes —log x and the description in terms of a log-normal distribution 
(or in terms of any other even function of log x) is independent of the reference 
currency. One often hears the following argument in favour of log-normals: since 
the price of an asset cannot be negative, its statistics cannot be Gaussian since the 
latter admits in principle negative values, whereas a log-normal excludes them by 
construction. This is however a red-herring argument, since the description of the 
fluctuations of the price of a financial asset in terms of Gaussian or log-normal 
Statistics is in any case an approximation which is only valid in a certain range. 
As we shall discuss at length below, these approximations are totally unadapted 
to describe extreme risks. Furthermore, even if a price drop of more than 100% 
is in principle possible for a Gaussian process,'? the error caused by neglecting 
such an event is much smaller than that induced by the use of either of these two 
distributions (Gaussian or log-normal). In order to illustrate this point more clearly, 
consider the probability of observing n times ‘heads’ in a series of N coin tosses, 
which is exactly equal to 2~" C¥,. It is also well known that in the neighbourhood 
of N/2, 2-NC% is very accurately approximated by a Gaussian of variance N /4; 
this is however not contradictory with the fact that n > 0 by construction! 

Finally, let us note that for moderate volatitities (up to say 20%), the two 
distributions (Gaussian and log-normal) look rather alike, especially in the ‘body’ 
of the distribution (Fig. 1.3). As for the tails, we shall see below that Gaussians 


substantially underestimate their weight, whereas the log-normal predicts that large 


SA log-normal distribution has the remarkable property that the knowledge of all its moments is not sufficient 


to characterize the corresponding distribution. It is indeed easy to show that the following distribution: 


Fee! exp [~}dozx)?] {1 + asin(2z log x)), for ja] < 1. has moments which are independent of the 


value of a, and thus coincide with those of a log-normal distribution, which corresponds to a = Q ({Feller]. 


p. 227). 
> This symmetry is however not always obvious. The dollar, for example, plays a special role, This symmetry 
can only be expected between currencies of similar strength. 


10 In the rather extreme case of a 20% annual volatility and a zero annual return, the probability for the price to 
become negative after a year in a Gaussian description is less than one out of 3 million. 
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Fig. 13. Comparison between a Gaussian (thick line) and a log-normal (dashed line), with 
m = Xo = 100 and o equal to 15 and 15% respectively. The difference between the two 
curves shows up in the tails. 


positive jumps are more frequent than large negative jumps. This is at variance with 
empirical observation: the distributions of absolute stock price changes are rather 
symmetrical; if anything, large negative draw-downs are more frequent than large 
positive draw-ups. 


1.3.3 Lévy distributions and Paretian tails 


Lévy distributions (noted L,,(x) below) appear naturally in the context of the CLT 
(see below), because of their stability property under addition (a property shared 
by Gaussians). The tails of Lévy distributions are however much ‘fatter’ than those 
of Gaussians, and are thus useful to describe multiscale phenomena (i.e. when both 
very large and very small values of a quantity can commonly be observed —such 
as personal income, size of pension funds, amplitude of earthquakes or other 
natural catastrophes, etc.). These distributions were introduced in the 1950s and 
1960s by Mandelbrot (following Pareto) to describe personal income and the price 
changes of some financial assets, in particular the price of cotton [Mandelbrot]. 
An important constitutive property of these Lévy distributions is their power-law 
behaviour for large arguments, often called ‘Pareto tails’: 


ro) 


ee a ee nd (1.18) 
pe 


where 0 < yz < 2 is a certain exponent (often called w), and A two constants 
which we call tail amplitudes, or scale parameters: A+ indeed gives the order of 
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magnitude of the large (positive or negative) fluctuations of x. For instance, the 
probability to draw a number larger than x decreases as P(x) = (A,/x)" for 
large positive x. 

One can of course in principle observe Pareto tails with 4 > 2; but, those tails 
do not correspond to the asymptotic behaviour of a Lévy distribution. 

In full generality, Lévy distributions are characterized by an asymmetry param- 
eter defined as 8 = (A, — A“)/(A4, + A®), which measures the relative weight 
of the positive and negative tails. We shall mostly focus in the following on the 
symmetric case 8 = 0. The fully asymmetric case (6 = 1) is also useful to describe 
strictly positive random variables, such as, for example, the time during which the 
price of an asset remains below a certain value, etc. 

An important consequence of Eq. (1.14) with 4 < 2 is that the variance of a 
Lévy distribution is formally infinite: the probability density does not decay fast 
enough for the integral, Eq. (1.6), to converge. In the case yz < 1, the distribution 
density decays so slowly that even the mean, or the MAD, fail to exist.!! The 
scale of the fluctuations, defined by the width of the distribution, is always set by 
A=A,=A_. 

There is unfortunately no simple analytical expression for symmetric Lévy 
distributions L,, (x), except for uz = 1, which corresponds to a Cauchy distribution 
(or ‘Lorentzian’ ): 

A 
However, the characteristic function of a symmetric Lévy distribution is rather 
simple, and reads: 


Iy(x) = (1.19) 


L,.(z) = exp (—a,|z|") , (1.20) 


where a, is a certain constant, proportional to the tail parameter A“.'? It is thus 
clear that in the limit 24 = 2, one recovers the definition of a Gaussian. When 
4 decreases from 2, the distribution becomes more and more sharply peaked 
around the origin and fatter in its tails, while ‘intermediate’ events lose weight 
(Fig. 1.4). These distributions thus describe ‘intermittent’ phenomena, very often 
small, sometimes gigantic. 

Note finally that Eq. (1.20) does not define a probability distribution when p> 
2, because its inverse Fourier transform is not everywhere positive. 


In the case B # 0, one would have: 


LP (z) = exp [ante (1 +ig tan(ux/2) =) | (u #1). (1.21) ° 


'l The median and the most probable value however still exist. For a symmetric Lévy distribution, the most 
Ee value defines the so-called ‘localization’ parameter m- 


2 For example, when 1 <p < 2, A# = pI(u — 1) sin(t 2/2)ay jm. 
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Fig. 1.4. Shape of the symmetric Lévy distributions with « = 0.8. 1.2, 1.6 and 2 (this 
jast value actually corresponds to a Gaussian). The smaller jz, the sharper the ‘body’ of the 
distribution, and the fatter the tails, as illustrated in the inset. 


It is important to notice that while the leading asymptotic term for large x is 
given by Eq. (1.18), there are subleading terms which can be important for finite x. 
The full asymptotic series actually reads: 


oo 5 n+1 a’ 
L(x) = ~ a AP (1 + ny) sin(run/2). (1.22) 


n=] 


The presence of the subleading terms may lead to a bad empirical estimate of 
the exponent yz based on a fit of the tail of the distribution. In particular, the 
‘apparent’ exponent which describes the function L,, for finite x is larger than 
jz, and decreases towards 42 for x —>» oo, but more and more slowly as yw gets 
nearer to the Gaussian value 4. = 2, for which the power-law tails no longer exist. 
Note however that one also often observes empirically the opposite behaviour, i.e. 
an apparent Pareto exponent which grows with x. This arises when the Pareto 
distribution, Eq. (1.18), is only valid in an intermediate regime x < 1/a, beyond 
which the distribution decays exponentially, say as exp(—ax). The Pareto tail is 
then ‘truncated’ for large values of x, and this leads to an effective 2 which grows 
with x. 

An interesting generalization of the Lévy distributions which accounts for this 
exponential cut-off is given by the ‘truncated Lévy distributions’ (TLD), which will 
be of much use in the following. A simple way to alter the characteristic function 
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Eq. (1.20) to account for an exponential cut-off for large arguments is to set:'* 


(1.23) 


EX? (ay = exp |< (a? + 27)? cos (warctan(|z|/a)) — "| 
7 ik it a “ q 


cos(z yu /2) 


for | < y < 2. The above form reduces to Eq. (1.20) for a = 0. Note that the 
argument in the exponential can also be written as: 


es. ae i? = = 
Feostrn/D) [(@ + iz) + (@ — iz) — 2a]. (1.24) 


Exponential tail: a limiting case 


Very often in the following, we shall notice that in the formal limit {2 > 00, the power- 
law tail becomes an exponential tail, if the tail parameter is simultaneously scaled as 
AF = (y4/a)". Qualitatively, this can be understood as follows: consider a Probability 
distribution restricted to positive x, which decays as a power-law for large x, defined as: 


“ 


= Ge 


(1.25) 


This shape is obviously compatible with Eq. (1.18), and is such that P(x = 0) = 1. If 
A = (j/e), one then finds: 


1 
Ps (x) = (1+ @x/mr sone PC). (1.26) 


1.3.4 Other distributions (*) 


There are obviously a very large number of other statistical distributions useful to 


describe random phenomena. Let us cite a few, which often appear in a financial - 


context: 


The discrete Poisson distribution: consider a set of points randomly scattered 
on the real axis, with a certain density @ (e.g. the times when the price of an 
asset changes). The number of points n in an arbitrary interval of length @ is 
distributed according to the Poisson distribution: 


(w£)" 
n! 


The hyperbolic distribution, which interpolates between a Gaussian ‘body’ and 
exponential tails: 


P(n)= 


exp(—wé8). (1.27) 


1 
P = —__ = 2 2 
(x) meat exp —[a,/xZ + x7], (1.28) 


where the normalization K; (ax) is a modified Bessel function of the second 


See L Koponen, Analytic approach to the problem of convergence to truncated Lévy flights towards the 
Gaussian stochastic process, Physical Review E, 52, 1197 (1995). 
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kind. For x small compared to x9, Py(x) behaves as a Gaussian although its 
asymptotic behaviour for x >> Xo is fatter and reads exp(—a|x}). 
From the characteristic function 


axoK\(xov 1 + az) 


Py) = ———_——., (1.29) 
: K\ (axo)V1 + az 
we can compute the variance 
2 eRe) (1.30) 
a Kj (axo) 
and kurtosis 
i 
a -3(75) 12 K2(ax0) _ 3b 
K (ax) axo K\(a@xo) 


Note that the kurtosis of the hyperbolic distribution is always between zero and 
three. In the case x9 = 0, one finds the symmetric exponential distribution: 


a 
Pe(x) = > exp(—@|x|), (1.32) 
with even moments mz, = (2n)!a~2", which gives 0? = 2a@~? and « = 3. Its 


characteristic function reads: P(z) = a?/(a* + 2’). 


e The Student distribution, which also has power-law tails: 


1 F(ad+yp)/2) alt 

PSO) = a2) @e axon’ (1.33) 
which coincides with the Cauchy distribution for z = 1, and tends towards a 
Gaussian in the limit 2 > 00, provided that a? is scaled as jz. The even moments 
of the Student distribution read: m2, = (2n — I)! (u/2 —n)/P (2/2) (a? /2)". 
provided 2n < yz; and are infinite otherwise. One can check that in the limit 
jt —> oo, the above expression gives back the moments of a Gaussian: m2, = 
(2n — 1)!!o". Figure 1.5 shows a plot of the Student distribution with « = |, 
corresponding to yz = 10. 


1.4 Maximum of random variables — statistics of extremes 


If one observes a series of N independent realizations of the same random 


phenomenon, a question which naturally arises, in particular when one is concerned 
about risk control, is to determine the order of magnitude of the maximum observed 
value of the random variable (which can be the price drop of a financial asset, or 
the water level of a flooding river, etc.). For example, in Chapter 3, the so-called 
‘value-at-risk’ (VaR) on a typical time horizon will be defined as the possible 
maximum loss over that period (within a certain confidence level). 
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Fig. 1.5. Probability density for the truncated Lévy (u = 3), Student and hyperbolic 
distributions. All three have two free parameters which were fixed to have unit variance 
and kurtosis. The inset shows a blow-up of the tails where one can see that the Student 
distribution has tails similar to (but slightly thicker than) those of the truncated Lévy. 


The law of large numbers tells us that an event which has a probability p of 
occurrence appears on average Np times on a series of N observations. One thus 
expects to observe events which have a probability of at least 1/N. It would be 
surprising to encounter an event which has a probability much smaller than 1 [N. 
The order of magnitude of the largest event, Amax, Observed in a series of N 
independent identically distributed (iid) random variables is thus given by: 


Ps (Amax) = 1/N. (1.34) 


More precisely, the full probability distribution of the maximum value Xmax = 
max;—),v{x;}, is relatively easy to characterize; this will justify the above simple 
criterion Eq. (1.34). The cumulative distribution’ P(max < A) is obtained by 
noticing that if the maximum of all x;’s is smaller than A, all of the x;°s must 
be smaller than A. If the random variables are iid, one finds: 

P(Xmax < A) = [P.(A)]”. 1.35) 


Note that this result is general, and does not rely on a specific choice for P(x). 
When A is large, it is useful to use the following approximation: 


P(Xmax < A) = [1— P.(A)}" ~ e NP), (1.36) 


Since we now have a simple formula for the distribution of xmax, one can invert 
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it in order to obtain, for example, the median value of the maximum, noted Ajyea, 

such that P(Xmax < Amed) = f 

log 2 
Nn 

More generally, the value A, which is greater than Xmax with probability p is given 

by 


P.(Amea) = 1 — (4)'" x (1.37) 


] 
Ps (Ap) ~ -—. (1.38) 


The quantity Ajax defined by Eq. (1.34) above is thus such that p = 1/e ~ 0.37. 
The probability that x, is even larger than Ajax is thus 63%. As we shall now 
show, Amax also corresponds, in many cases, to the most probable value of Xmax. 

Equation (1.38) will be very useful in Chapter 3 to estimate a maximal potential 
loss within a certain confidence level. For example, the largest daily loss A 
expected next year, with 95% confidence, is defined such that P.(—A) = 
—log(0.95)/250, where P. is the curhulative distribution of daily price changes, 
and 250 is the number of market days per year. 

Interestingly, the distribution of xmax only depends, when N is large, on the 
asymptotic behaviour of the distribution of x, P(x), when x —> oo. For example, 
if P(x) behaves as an exponential when x -> ov, or more precisely if PL (x) ~ 
exp(—a.x), one finds: 


log N 


Amax = (1.39) 


which grows very slowly with N.'* Setting xmax = Amax + (u/a), one finds that 
the deviation u around Ama, is distributed according to the Gumbel distribution: 


P(u) =e “e™. (1.40) 


The most probable value of this distribution is u = 0.'> This shows that A,ax 
is the most probable value of xmax. The result, Eq. (1.40), is actually much more 
general, and is valid as soon as P(x) decreases more rapidly than any power-law 
for x —> oo: the deviation between Ajax (defined as Eq. (1.34)) and xmax is always 
distributed according to the Gumbel law, Eq. (1.40), up to a scaling factor in the 


definition of u. 


The situation is radically different if P(x) decreases as a power-law, cf. 
Eq. (1.14). In this case, 


Av’ 
P(x) = —, (1.41) 
xe 
4 For example, for a symmetric exponential distribution P(x) = exp(—|x|)/2, the median value of the 


taximum of N = 10000 variables is only 6.3. 
‘5 This distribution is discussed further in the context of financial risk control in Section 3.1.2, and drawn in 
Figure 3.1. 
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and the typical value of the maximum is given by: 
Amax = AzN*; (1.42) 


Numerically, for a distribution with 4 = 3 and a scale factor A, = |, the largest of 
N = 10000 variables is on the order of 450, whereas for j2 = 4 it is one hundred 
million! The complete distribution of the maximum, called the Fréchet distribution, 
is given by: 


— Ft _ _*max 
P(u) = oa “= iE (1.43) 


Its asymptotic behaviour for u —> ov is still a power-law of exponent | + yu. Said 
differently, both power-law tails and exponential tails are stable with respect to the 
‘max’ operation.'® The most probable value Ximax is now equal to (44/1+42)!/" Amax- 
As mentioned above, the limit j4 —> oo formally corresponds to an exponential 
distribution. In this limit, one indeed recovers Ajax as the most probable value. 


Equation (1.42) allows us to discuss intuitively the divergence of the mean value for 
ju = | and of the variance for 4 < 2. If the mean value exists, the sum of N random 
variables is typically equal to Nm, where m is the mean (see also below). But when 1 < 1, 
the largest encountered value of X is on the order of N'/" >> N, and would thus be larger 
than the entire sum. Similarly, as discussed below, when the variance exists, the RMS of 


te sum ts equal too VN. But for « < 2, Xmax grows faster than /N. 


More generally, one can rank the random variables x; in decreasing order, and 
ask for an estimate of the nth encountered value, noted A[m] below. (In particular, 
A{ 1] = Xmax). The distribution P, of Af] can be obtained in full generality as: 


P,(Aln]) = NCK) P(x = Aln]) (P(x > Ala})"""(POx < Aln)*—". (1.44) 


The previous expression means that one has first to choose A{n] among N variables 
\.V ways), 2 — | variables among the N — | remaining as the n — 1 largest ones 
Rata ways), and then assign the corresponding probabilities to the configuration 
where x — | of them are larger than A[n] and N —n are smaller than A[n]. One can 
study the position A*[n] of the maximum of P,,, and also the width of P,, defined 
trom the second derivative of log P,, calculated at A*[n]. The calculation simplifies 
in the Jimit where N — 00, — 00, with the ratio n/N fixed. In this limit, one 
tinds a relation which generalizes Eq. (1.34): 


P2(A*[n]) = n/N. (1.45) 


° A third class of laws, stable under ‘max’ concerns random variables, which are bounded from above — i.e. such 
that P(x) = 0 for.x > A47, with xyg finite. This leads to the Weibull distributions, which we will not consider 
further in this book. 
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The width uw, of the distribution is found to be given by: 


_ tf yl-@/Nny 
~ JN P(x = A*{n]) 
which shows that in the limit NM —> 090, the value of the nth variable is more and 
more sharply peaked around its most probable value A*[i], given by Eq. (1.45). 


In the case of an exponential tail, one finds that A*{n] ~ log(N/n)/a@; whereas 
in the case of power-law tails, one rather obtains: 


Wy, (1.46) 


1 
N\ew 

A*[n] = Ay (=) . (1.47) 
n 


This last equation shows that, for power-law variables, the encountered values are 
hierarchically organized: for example, the ratio of the largest value Xmax = A[1] to 
the second largest A[2] is of the order of 2'/“, which becomes larger and larger as 
lt decreases, and conversely tends to one when & > 00. 

The property, Eq. (1.47) is very useful in identifying empirically the nature 
of the tails of a probability distribution. One sorts in decreasing order the set of 
observed values {x,,.x2,...,xXy} and one simply draws A[n] as a function of n. 
If the variables are power-law distributed, this graph should be a straight line in 
log-log plot, with a slope —1/, as given by Eq. (1.47) (Fig. 1.6). On the same 
figure, we have shown the result obtained for exponentially distributed variables. 
On this diagram, one observes an approximately straight line, but with an effective 
slope which varies with the total number of points N: the slope is less and less as 
N/n grows larger. In this sense, the formal remark made above, that an exponential 
distribution could. be seen as a power-law with  —> oo, becomes somewhat 
more concrete. Note that if the axes x and y of Figure 1.6 are interchanged, then 
according to Eq. (1.45), one obtains an estimate of the cumulative distribution, P... 


Let us finally note another property ef power-laws, potentially interesting for their 
empirical determination. If one computes the average value of x conditioned to a certain 
minimum value A: 


[x xP(x) dx 


TAS - . 1.48) 
(x) A lira P(x) dx ( 
then, if P(x) decreases as in Eq. (1.14), one finds, for A —> 0x, 
bi ieee ool (1.49) 


u-l 


independently of the tail amplitude Al. "7 The average (x) is thus always of the same 
order as A itself, with a proportionality factor which diverges as 1 — 1. 


'7 ‘This means that yu can be determined by a one parameter fit only. 
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Fig. 1.6. Amplitude versus rank plots. One plots the value of the nth variable A[n] as a 
function of its rank n. If P(x) behaves asymptotically as a power-law, one obtains a straight 
line in log-log coordinates, with a slope equal to —1/j. For an exponential distribution. 
one observes an effective slope which is smaller and smaller as N/n tends to infinity. The 
points correspond to synthetic time series of length 5000, drawn according to a power-law 
with yz = 3, or according to an exponential, Note that if the axes x and y are interchanged, 
then according to Eq. (1.45), one obtains an estimate of the cumulative distribution, P. . 


1.5 Sums of random variables 


In order to describe the statistics of future prices of a financial asset, one a 
priori needs a distribution density for all possible time intervals, corresponding 
to different trading time horizons. For example, the distribution of 5-min price 
fluctuations is different from the one describing daily fluctuations, itself different 
for the weekly, monthly, etc. variations. But in the case where the fluctuations are 
independent and identically distributed (iid), an assumption which is, however, 
usually not justified, see Sections 1.7 and 2.4, it is possible to reconstruct the 
distributions corresponding to different time scales from the knowledge of that 
describing short time scales only. In this context, Gaussians and Lévy distributions 
play a special role, because they are stable: if the short time scale distribution is 
a stable law, then the fluctuations on all time scales are described by the same 
stable law —only the parameters of the stable law must be changed (in particular its 
width). More generally, if one sums iid variables, then, independently of the short 


time distribution, the law describing long times converges towards one of the stable ° 


laws: this is the content of the ‘central limit theorem’ (CLT). In practice, however, 
this convergence can be very slow and thus of limited interest, in particular if one 
is concerned about short time scales. 


LS Sums of random variables Di 


1.5.1 Convolutions 


What is the distribution of the sum of two independent random variable? This sum 
can, for example, represent the variation of price of an asset between today and 
the day after tomorrow (X), which is the sum of the increment between today 
and tomorrow (X,) and between tomorrow and the day after tomorrow (X>2), both 
assumed to be random and independent. 

Let us thus consider X = X, + Xz where X, and X> are two random variables, 
independent, and distributed according to P;(x;) and P (x2), respectively. The 
probability that X is equal to x (within dx) is given by the sum over all possibilities 
of obtaining X = x (that is all combinations of X; = x; and X» = x such that 
xX, + X2 = x), weighted by their respective probabilities. The variables X; and X> 
being independent, the joint probability that X; = x, and X2 = x — x, is equal to 
P\(x;)P3(x — x,), from which one obtains: 


P(x,N =2)= / P,(x') Po(x — x’) dx’. (1.50) 


This equation defines the convolution between P;(x) and P(x), which we shall 


‘write P = P, * P3, The generalization to the sum of N independent random 


variables is immediate. If X = X; + X2 +---+Xy with X; distributed according 
to P;(x;), the distribution of X is obtained as: 
N-1 
P(x.N)= / P,(x}) ~~~ Pr—i(xy_1) Pw (x — x} — +++ — xy_)) I] dx;. (1.51) 
i=l 
One thus understands how powerful is the hypothesis that the increments are iid, 
ie. that P} = P) = --- = Py. Indeed, according to this hypothesis, one only needs 
to know the distribution of increments over a unit time interval to reconstruct that 
of increments over an interval of length N: it is simply obtained by convoluting the 
elementary distribution N times with itself. 
The analytical or numerical manipulations of Eqs (1.50) and (1.51) are much eased by the 


use of Fourier transforms, for which convolutions become simple products. The equation 
P(x, N = 2) =[P; * P2](x), reads in Fourier space: 


P(z,N=2)= fete’ [rer —x') dx’ dx = Pi(z)Po(z). (1.52) 


In order to obtain the Nth convolution of a function with itself, one should raise its 
characteristic function to the power N, and then take its inverse Fourier transform. 


1.5.2 Additivity of cumulants and of tail amplitudes 


It is clear that the mean of the sum of two random variables (independent or not) 
is equa] to the sum of the individual means. The mean is thus additive under 
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convolution, Similarly, if the random variables are independent, one can show that 
their variances (when they both exist) are also additive. More generally, all the 
cumulants (c,) of two independent distributions simply add. This follows from 
the fact that since the characteristic functions multiply, their logarithm add. The 
additivity of cumulants is then a simple consequence of the linearity of derivation. 
The cumulants of a given law convoluted N times with itself thus follow the 
simple rule cy, = Nen,; where the {c,,;} are the cumulants of the elementary 
distribution P,. Since the cumulant c, has the dimension of X to the power 1, its 
relative importance is best measured in terms of the normalized cumulants: 


ea CoN nA Nim, : (1.53) 


""(aw)? (er)? 
The normalized cumulants thus decay with N for n > 2: the higher the cumulant, 
the faster the decay: 4” « N!~*/?, The kurtosis x, defined above as the fourth 
normalized cumulant, thus decreases as 1/N. This is basically the content of 
the CLT: when N is very large, the cumulants of order > 2 become negligible. 
Therefore, the distribution of the sum is only characterized by its first two 
cumulants (mean and variance): it is a Gaussian. 

Let us now turn to the case where the elementary distribution P;(x;) decreases ‘ 
as a power-law for large arguments x, (cf. Eq. (1.14)), with a certain exponent 
yu. The cumulants of order higher than yz are thus divergent. By studying the 
small z singular expansion of the Fourier transform of P(x, N), one finds that 
the above additivity property of cumulants is bequeathed to the tail amplitudes A{: 
the asymptotic behaviour of the distribution of the sum P(x, N) still behaves as a 
power-law (which is thus conserved by addition for all values of z, provided one 
takes the limit x — 00 before N — co-see the discussion in Section 1.6.3), with 
a tail amplitude given by: 


At y = NAL. (1.54) 


The tail parameter thus plays the role, for power-law variables, of a generalized 
cumulant. 


1.5.3 Stable distributions and self-similarity 


If one adds random variables distributed according to an arbitrary law P)(x;), 
one constructs a random variable which has, in general, a different probability 
distribution (P(x, N) = [P)(x1)]*’). However, for certain special distributions; 
the Jaw of the sum has exactly the same shape as the elementary distribution — these 
are called stable laws. The fact that two distributions have the ‘same shape’ means 
that one can find a (N-dependent) translation and dilation of x such that the two 
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laws coincide: 
P(x, N)dx = P;(x,;) dx; where x = ay x, + by. (1.55) 


The distribution of increments on a certain time scale (week, month, year) is thus 
scale invariant, provided the variable X is properly rescaled. In this case, the 
chart giving the evolution of the price of a financial asset as a function of time 
has the same statistical structure, independently of the chosen elementary time 
scale—only the average slope and the amplitude of the fluctuations are different. 
These charts are then called se/f-similar, or, using a better terminology introduced 
by Mandelbrot, self-affine (Figs 1.7 and 1.8). 
The family of all possible stable laws coincide (for continuous variables) with 
the Lévy distributions defined above,'® which include Gaussians as the special 
case 44 = 2. This is easily seen in Fourier space, using the explicit shape of 
the characteristic function of the Lévy distributions. We shall specialize here for 
simplicity to the case of symmetric distributions P;(x,) = P,(—x,), for which 
the translation factor is zero (by = 0). The scale parameter is then given by 


. ay = N'/#,!9 and one finds, for yz < 2: 


(Ixl?)@ xo AN® 3 g<p (1.56) 


where A = A, = A_. In words, the above equation means that the order of 
magnitude of the fluctuations on ‘time’ scale N is a factor N'’# larger than the 
fluctuations on the elementary time scale. However, once this factor is taken into 
account, the probability distributions are identical. One should notice the smaller 


' the value of yz, the faster the growth of fluctuations with time. 


1.6 Central limit theorem 


We have thus seen that the stable Jaws (Gaussian and Lévy distributions) are ‘fixed 
points’ of the convolution operation. These fixed points are actually also attractors, 
in the sense that any distribution convoluted with itself a large number of times 
finally converges towards a stable law (apart from some very pathological cases). 
Said differently, the limit distribution of the sum of a large number of random 
variables is a stable law. The precise formulation of this result is known as the 
central limit theorem (CLT). 


'S For discrete variables, one should also add the Poisson distribution Eq. (1.27). 
'9 The case yp = 1 is special and involves extra logarithmic factors, 
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Fig. 1.7. Example of a self-affine function, obtained by summing random variables. One 
plots the sum x as a function of the number of terms N in the sum, for a Gaussian 
elementary distribution P;(x,). Several successive ‘zooms’ reveal the self-similar nature 
of the function, here with ay = N1/2. 


1.6.1 Convergence to a Gaussian 


The classical formulation of the CLT deals with sums of iid random variables‘of ~ 


finite variance a? towards a Gaussian. In a more precise way, the result is then the 
following: 


x—mN #2] 295 ‘ 
lim P < ———- < w) =¥ | ee? du, (1.57) 
N00 (« avVvN ; uy W200 


for all finite u;, ua. Note however that for finite N, the distribution of the sum X = 
X,+-+-++ Xvy in the tails (corresponding to extreme events) can be very different 
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Fig. 1.8. In this case, the elementary distribution P| (x;) decreases as a power-law with an 


exponent j4 = 1.5. The scale factor is now given by ay = N73. Note that, contrarily to 
the previous graph, one clearly observes the presence of sudden ‘jumps’, which reflect the 
existence of very large values of the elementary increment x}. 


from the Gaussian prediction; but the weight of these non-Gaussian regions tends 
to zero when N goes to infinity. The CLT only concerns the central! region, which 


. keeps a finite weight for N large: we shall come back in detail to this point below. 


The main hypotheses ensuring the validity of the Gaussian CLT are the follow- 
ing: 


e The X; must be independent random variables, or at least not ‘too’ correlated 
(the correlation function (x;x;) — m? must decay sufficiently fast when |i — j| 
becomes large, see Section 1.7.1 below), For example, in the extreme case where 
all the X; are perfectly correlated (i.e. they are all equal), the distribution of X 
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is obviously the same as that of the individual X, (once the factor N has been 
properly taken into account). 

e The random variables X; need not necessarily be identically distributed. One 
must however require that the variance of all these distributions are not too 
dissimilar, so that no one of the variances dominates over all the others (as 
would be the case, for example, if the variances were themselves distributed as 
a power-law with an exponent jz < 1). In this case, the variance of the Gaussian 
limit distribution is the average of the individual variances. This also allows one 
to deal with sums of the type X = p, X, + p2X2+---+pnXw, where the p; are 
arbitrary coefficients; this case is relevant in many circumstances, in particular 
in portfolio theory (cf. Chapter 3). 


e Formally, the CLT only applies in the limit where N is infinite. In practice, 
N must be large enough for a Gaussian to be a good approximation of the 
distribution of the sum. The minimum required value of N (called N* below) 
depends on the elementary distribution P,(x,) and its distance from a Gaussian. 
Also, N* depends on how far in the tails one requires a Gaussian to be a good 
approximation, which takes us to the next point. 


e As mentioned above, the CLT does not tell us anything about the tails of the 
distribution of X; only the central part of the distribution is wel] described by 
a Gaussian. The ‘central’ region means a region of width at least of the order 
of No around the mean value of X. The actual width of the region where 
the Gaussian turns out to be a good approximation for large finite N crucially 
depends on the elementary distribution P; (x). This problem will be explored in 
Section 1.6.3. Roughly speaking, this region is of width ~ N*/4o for ‘narrow’ 
symmetric elementary distributions, such that all even moments are finite. This 
region is however sometimes of much smaller extension: for example, if P(x) 
has power-law tails with 4 > 2 (such that o is finite), the Gaussian ‘realm’ 
grows barely faster than /N (as ~ \/N log N). 


The above formulation of the CLT requires the existence of a finite variance. This 
condition can be somewhat weakened to include some ‘marginal’ distributions such as 
a power-law with 4 = 2. In this case the scale factor is not ay = JN but rather 
ay = JN log N. However, as we shall discuss in the next section, elementary distributions 
which decay more slowly than |x{~> do not belong the the Gaussian basin of attraction. 
More precisely, the necessary and sufficient condition for P\(x,) to belong to this basin is 
that: 


Hae Py <(—u) + Pis (uw) ao 


: ra] ae Da 
u oo \u’'t<u # Py(u') du 


(1.58) 


This condition is always satisfied if the variance is finite, but allows one to include the 
marginal cases such as a power-law with 2 = 2. 
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The central limit theorem and information theory 


It is interesting to notice that the Gaussian is the law of maximum entropy ~ or minimum 
information—such that its variance is fixed. The missing information quantity T (or 
entropy) associated with a probability distribution P is defined as:79 


TIP) = -f P(x) log P(x) dx. (1.59) 


The distribution maximizing T[ P] for a given value of the variance is obtained by taking a 
functional derivative with respect to P(x): 


a 2 ! a ' , "3 
P(x) [ari-s fs P(x )dx' -—¢ J wrar]=o. (1.60) 


where ¢ is fixed by the condition f x? P(x) dx = a? and ¢' by the normalization of P(x). 
It is immediately seen that the solution to Eq. (1.60) is indeed the Gaussian. The numerical 
value of its entropy is: 


£4 
Ig = 3 + 5 log(2z) + log(a) ~ 1.419 + log(c). (1.61) 
eb Sena one can compute the entropy of the symmetric exponential distribution, 
Wwhicn IS: 
log 2 
Tg=1+ = + log(a) ~ 1.346 + log(o). (1.62) 


It is important to realize that the convolution operation is ‘information burning’, since 
all the details of the elementary distribution P,(x,) progressively disappear while the 
Gaussian distribution emerges, 


1.6.2 Convergence to a Lévy distribution 


Let us now turn to the case of the sum of a large number N of iid random 
variables, asymptotically distributed as a power-law with « < 2, and with a tail 
amplitude A* = A = A“ (cf. Eq. (1.14)). The variance of the distribution is 
thus infinite. The limit distribution for large N is then a stable Lévy distribution 
of exponent yz and with a tail amplitude N A”. If the positive and negative tails 
of the elementary distribution P;(x;) are characterized by different amplitudes 


(A% and A‘) one then obtains an asymmetric Lévy distribution with parameter 


B = (A — A®)/(At + A“), If the ‘left’ exponent is different from the ‘right’ 
exponent (4_ ~ j1+), then the smallest of the two wins and one finally obtains 
a totally asymmetric Lévy distribution (8 = —1 or B = 1) with exponent 
= min(“_, 4,). The CLT generalized to Lévy distributions applies with the 
same precautions as in the Gaussian case above. 


20 Note that entropy is defined up to an additive constant. It is common to add 1 to the above definition. 
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Technically, a distribution P;(x;) belongs to the basin of attraction of the Lévy distribu- 
tion Ly.p if and only if- 
. Pre{-n) l- 
fig, (1.63) 
uo Py. (ub) B 
and for all r, 
Pi<(—u) + Pis 4) 
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A distribution with an asymptotic tail given by Eq. (1.14) is such that, 


m Xu 


A A 
Pi<(u) > —— and Py>(u) | = — (1.65) 
oo jul soc ue 


and thus belongs to the attraction basin of the Lévy distribution of exponent and 
asymmetry parameter B = (AY. — AM)/(AL + A®). 


1.6.3 Large deviations 


The CLT teaches us that the Gaussian approximation is justified to describe the 
‘central’ part of the distribution of the sum of a large number of random variables 
(of finite variance). However, the definition of the centre has remained rather vague 
up to now. The CLT only states that the probability of finding an event in the tails 
goes to zero for large N. In the present section, we characterize more precisely the 


region where the Gaussian approximation is valid. 
2 


If X is the sum of N iid random variables of mean m and variance o~, one 
defines a ‘rescaled variable’ U as: 
X-N 
fafa (1.66) 
aovN 


which according to the CLT tends towards a Gaussian variable of zero mean and 
unit variance. Hence, for any fixed u, one has: 


lim P.(u) = Pg.(u), (1.67) 


Noo 


where Pg.(u) is the related to the error function, and describes the weight 
contained in the tails of the Gaussian: 


Pas (u) = [ z= exp(—u"/2) du’ = Serfc (=) : (1.68) 


However, the above convergence is not uniform. The value of N such that the 
approximation P.(u) ~ Pgs(u) becomes valid depends on u. Conversely, for 
fixed N, this approximation is only valid for u not too large: |u| < uo(N). 

One can estimate uo(N) in the case where the elementary distribution P; (x) is 
‘narrow’, that is, decreasing faster than any power-law when |x;| > ©, such that 
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all the moments are finite. In this case, all the cumulants of P; are finite and one 
can obtain a systematic expansion in powers of N~'/? of the difference AP. (u) = 
P.(u) — Pgs (u), 


exp(—u?/2) (Qe Qr(u) | Ox (u) 


AP, a wee 
P>(u) on NiV2 N + N&E/2 


where | the Qx(u) are polynomials functions which can be expressed in terms of 
the normalized cumulants An (cf. Eq. (1.12)) of the elementary distribution. More 
explicitly, the first two terms are given by: 


Oy (u) = gAs(u’ — 1), (1.70) 


ten), (1.69) 


and 
Or(u) = RA3ue + E4ag — Paz? + (HAZ — radu. (1.71) 


One recovers the fact that if all the cumulants of P;(x,) of order larger than 
two are zero, all the Q; are also identically zero and so is the difference between 
P(x, N) and the Gaussian. 

For a general asymmetric elementary distribution P,, 43 is non-zero. The leading 
term in the above expansion when JN is large is thus Q;(u). For the Gaussian 
approximation to be meaningful, one must at least require that this term is small in 
the central region where u is of order one, which corresponds to x —mN ~ oJN. 
This thus imposes that N >> N* = Aj. The Gaussian approximation remains 
valid whenever the relative error is small compared to 1. For large u (which will 
be justified for large N), the relative error is obtained by dividing Eq. (1.69) by 
Pg. (u) X exp(—u?/2)/(u2z). One then obtains the following condition:”! 


Nn \'6 
Agu < N'? ie. |x -Nm| <oV/N (x) ; (1.72) 


This shows that the central region has an extension growing as N7/°. 

A symmetric elementary distribution is such that 43 = 0; it is then the kurtosis 
kK = Ag that fixes the first correction to the Gaussian when N is large, and thus the 
extension of the central region. The conditions now read: N >> N* = d4 and 


N\A 
=) ; (1.73) 


The central region now extends over a region of width N*/*. 

The results of the present section do not directly apply if the elementary 
distribution P;(x,) decreases as a power-law (‘broad distribution’). In this case, 
some of the cumulants are infinite and the above cumulant expansion, Eq. (1.69), is 


Agut<N ie. |x —Nm| < aN ( 


2] The above arguments can actually be made fully rigorous, see [Peller]. 
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meaningless. In the next section, we shall see that in this case the ‘central’ region 
is much more restricted than in the case of ‘narrow’ distributions. We shall then 
describe in Section 1.6.5, the case of “truncated’ power-law distributions, where the 
above conditions become asymptotically relevant. These laws however may have 
a very large kurtosis, which depends on the point where the truncation becomes 
noticeable, and the above condition N >> Aq can be hard to satisfy. 


Cramer function 
More generally, when N is large, one can write the distribution of the sum of N iid random 
z 9 : 
variables as:~~ 


POx,N) = exp [-ns (=)]. (1.74) 


— 


where S is the so-called Cramér function, which gives some information about the 
probability of X even outside the ‘central’ region. When the variance is finite, S grows 
as S(u) « u? for small u's, which again leads to a Gaussian central region. For finite u, S 
can be computed using Laplace's saddle point method, valid for N large. By definition: 


P(x,N) = a | own (-ic= + loglP(2)1) dz. (1.75) 


When N is large, the above integral is dominated by the neighbourhood of the point z* 
where the term in the exponential is stationary. The results can be written as: 


P(x, N) = exp[=NS (=)]. (1.76) 
with S(u) given by: 


dlog{ Fy (z)) 


de =u S(u) = —iz*u + log{ P; Gh (1.77) 


* 


which, in principle, allows one to estimate P(x. N) even outside the central region. Note 


that if S(u) is finite for finite u, the corresponding probability is exponentially small in N., 


1.6.4 The CLT at work on a simple case 
It is helpful to give some flesh to the above general statements, by working out 
explicitly the convergence towards the Gaussian in two exactly soluble cases. On 
these examples, one clearly sees the domain of validity of the CLT as well as its 
limitations. 
Let us first study the case of positive random variables distributed according to 
the exponential distribution: 


P| (x) = O(x,))ae™', (1.78) 
where @(x;) is the function equal to | for x; = 0 and to 0 otherwise. A simple 


22 We assume that their mean is zero. which can always be achieved through a suitable shift of x1. 
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computation shows that the above distribution is correctly normalized, has a mean 
7 


given by m = a! anda variance given by o* = a~~. Furthermore, the exponential 
distribution is asymmetrical: its skewness is given by c; = ((x — m)*) = 2a7}, or 
A3 =2: 

The sum of N such variables is distributed according to the Nth convolution 
of the exponential distribution. According to the CLT this distribution should 
approach a Gaussian of mean mWN and of variance No*. The Nth convolution of 
thé exponentiaf distribution can be computed exactly. The result im 
N-Ip-ax 


P(x.N) = Ona” (1.79) 


N-1)!° 
which is called a ‘Gamma’ distribution of index N. At first sight, this distribution 
does not look very much like a Gaussian! For example, its asymptotic behaviour is 
very far from that of a Gaussian: the ‘left’ side is strictly zero for negative x, while 
the ‘right’ tail is exponential, and thus much fatter than the Gaussian. It is thus 
very clear that the CLT does not apply for values of x too far from the mean value. 
However, the central region around Nm = Na! is well described by a Gaussian. 
The most probable value (x*) is defined as: 


d 


—- = 0, (1.80) 


x* 


or x* = (N — 1)m. An expansion in x — x* of P(x. N) then gives us: 


a2(x — x*)? 
log P(x.N) = —-K(N-1)-1 _ ——_—— 
og P(x. N) ( ) = leat Sar 1) 
(x —x*y 
ali lal ATTEN’ 9 6 Tee hss A 1.81 
ND? * (x — x") (1.81) 
where 
K(N) =logN!+N—NlogN = L log(2xN). (1.82) 
a —3o * 


Hence, to second order in x—x*, P(x, N) is given by a Gaussian of mean (N —1)m 
and variance (N — 1)o7. The relative difference between N and N — 1 goes to 
zero for large N. Hence, for the Gaussian approximation to be valid, one requires 
not only that NV be large compared to one, but also that the higher-order terms in 
(x — x*) be negligible. The cubic correction is small compared to | as long as 
a|x — x*| < N*, in agreement with the above general statement, Eq. (1.72), 
for an elementary distribution with a non-zero third cumulant. Note also that 
for x —> 00, the exponential behaviour of the Gamma function coincides (up 


23 This result can be shown by induction using the definition (Eq. (1.50)). 
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to subleading terms in x"~') with the asymptotic behaviour of the elementary 
distribution P, (x,). 

Another very instructive example is provided by a distribution which behaves 
as a power-law for large arguments, but at the same time has a finite variance to 
ensure the validity of the CLT. Consider the following explicit example of a Student 
distribution with yz = 3: 

2a? 


P\(x4) = ———— 
(x4) Waa ae 


(1.83) 


where a is a positive constant. This symmetric distribution behaves as a power-law 
with y = 3 (cf. Eq. (1.14)); all its cumulants of order larger than or equal to three 
are infinite. However, its variance is finite and equal to a’. 


It is useful to compute the characteristic function of this distribution, 


Piz) = (1 +alelye 4, (1.84) 
and the first terms of its small z expansion, which read: 
3q3 
Py(z) = 1 aang ae + O(2'). (1.85) 


The first singular term in this expansion is thus |z|°, as expected from the asymptotic 
behaviour of P\(x,) in oe and the divergence of the moments of order larger than three. 
The Nth convolution of P\(x,) thus has the following characteristic function: 


PY (2) = (1 +ale|yNem@ El, (1.86) 


which, expanded around z = 0, gives: 


Nz2a? 4 N\zPa? 
3 


Note that the \z|* singularity (which signals the divergence of the moments my, for n > 3) 
does not disappear under convolution, even if at the same time P(x, N) converges towards 
the Gaussian. The resolution of this apparent paradox is again that the convergence 
towards the Gaussian only concerns the centre of the distribution, whereas the tail in x~* 
survives for ever (as was mentioned in Section 1.5.3). 


AY (k) Sh— + O(z4). (1.87) 


As follows from the CLT, the centre of P(x, N) is well approximated, for N 
large, by a Gaussian of zero mean and variance Na?: 


l 2 
———- exp | -—— }.. 
V20Na , ( sm 
On the other hand, since the power-law behaviour is conserved upon addition and 
that the tail amplitudes simply add (cf. Eq. (1.14)), one also has, for large x’s: 
2Na> 


soc axt” 


PONS (1.88) 


P(x, »N) = (1.89) 
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The above two expressions Eqs (1.88) and (1.89) are not incompatible. since these 
describe two very different regions of the distribution P(x. N). For fixed N, there 
is a characteristic value x9(N) beyond which the Gaussian approximation for 
P(x. N) is no longer accurate, and the distribution is described by its asymptotic 
power-law regime. The order of magnitude of xy(N) is fixed by looking at the point 
where the two regimes match to one another: 


1 xe 2Na> 
———- exp [ — =} = 
J/2nNa 2Na’ XG 


(1.90) 


One thus finds, 
xo(N) =~ aN log N, (1.91) 


(neglecting subleading corrections for large N). 

This means that the rescaled variable U = X/(a/N) becomes for large N a 
Gaussian variable of unit variance, but this description ceases to be valid as soon 
as u ~ ,/log N, which grows very slowly with N. For example, for N equal to a 


‘million, the Gaussian approximation is only acceptable for fluctuations of u of less 


than three or four RMS! 

Finally, the CLT states that the weight of the regions where P(x, NV) substan- 
tially differs from the Gaussian goes to zero when N becomes large. For our 
example, one finds that the probability that X falls in the tail region rather than 
in the central region is given by: 


2 2a3N I 


Pe(xo) + P(x | a 
(xo) (Xo) JNios?N 


a./N logN mx4 
which indeed goes to zero for large N. 
The above arguments are not special to the case 44 = 3 and in fact apply more 
generally, as long as yz > 2, i.e. when the variance is finite. In the general case, one 


finds that the CLT is valid in the region |x| < x9 « \/N log N, and that the weight 
of the non-Gaussian tails is given by: 


(1.92) 


1 


P(x) + Ps (xo) & Nu ope? 


(1.93) 
which tends to zero for large N. However, one should notice that as jz approaches 
the ‘dangerous’ value 44 = 2, the weight of the tails becomes more and more 
important. For 2 < 2, the whole argument collapses since the weight of the tails 
would grow with NV. In this case, however, the convergence is no longer towards 
the Gaussian, but towards the Lévy distribution of exponent jz. 
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1.6.5 Truncated Lévy distributions 


An interesting case is when the elementary distribution P;(x)) is a truncated 
Lévy distribution (TLD) as defined in Section 1.3.3. The first cumulants of the 
distribution defined by Eq. (1.23) read, for 1 < yx < 2: 


re | (1.94) 


= a (eee 
Saas ‘oosruael 


The kurtosis k = 44 = c4/c3 is given by: 


3 (3 — w)(2 — p)|cosmy/2| 


A4 
(EL a I)a,c# 


(1.95) 
Note that the case 4 = 2 corresponds to the Gaussian, for which Ay = 0 as 
expected. On the other hand, when « — 0, one recovers a pure Lévy distribution, 
for which c2 and cq are formally infinite. Finally, if ~ > oo with a,o~? fixed, 
one also recovers the Gaussian. 

If one considers the sum of N random variables distributed according to a TLD, 
the condition for the CLT to be valid reads (for 2 < 2):74 


N > N* =)4 => (Na,)* >a, (1.96) 


This condition has a very simple intuitive meaning. A TLD behaves very much 
like a pure Lévy distribution as long as x < a7!. In particular, it behaves as a 
power-law of exponent j and tail amplitude A” « a, in the region where x is 
large but still much smaller than a~! (we thus also assume that @ is very small). If 
N is not too large, most values of x fall in the Lévy-like region. The largest value of 
x encountered is thus of order Xmax ~ AN'/ (cf. Eq. (1.42)). If xmax is very small 
compared to @~', it is consistent to forget the exponential cut-off and think of the 
elementary distribution as a pure Lévy distribution. One thus observe a first regime 
in N where the typical value of X grows as N'/#, as if @ was zero.2> However, as 
illustrated in Figure 1.9, this regime ends when xjax reaches the cut-off value a7!; 
this happens precisely when N is of the order of N* defined above. For N > N*, 
the variable X progressively converges towards a Gaussian variable of width /N, 
at least in the region where |x| < o N*/*/N*!/4, The typical amplitude of X thus 
behaves (as a function of N) as sketched in Figure 1.9. Notice that the asymptotic 
part of the distribution of X (outside the central region) decays as an exponential 
for all values of N. 


74 One can see by inspection that the other conditions, concerning higher-order cumulants, and which read 
NET > 1, are actually equivalent to the one written here. 

25 Note however that the variance of X grows like N for all N. However, the variance is dominated by the cut-off 
and. in the region N < N”, grossly overestimates the typical values of X, see Section 2.3.2. 
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Fig. 1.9. Behaviour of the typical value of X as a function of N for TLD variables. When 
N <N*,x grows as N'/# (dotted line). When N ~ N*, x reaches the value a~! and the 
exponential cut-off starts being relevant. When N > N7%, the behaviour predicted by the 


- CLT sets in, and one recovers x « VN (plain line). 


1.6.6 Conclusion: survival and vanishing of tails 


The CLT thus teaches us that if the number of terms in a sum is large, the 
sum becomes (nearly) a Gaussian variable. This sum can represent the temporal 
aggregation of the daily fluctuations of a financial asset, or the aggregation, in 
a portfolio, of different stocks. The Gaussian (or non-Gaussian) nature of this 


~ sum is thus of crucial importance for risk control, since the extreme tails of the 


distribution correspond to the most ‘dangerous’ fluctuations. As we have discussed 
above, fluctuations are never Gaussian in the far-tails: one can explicitly show 
that if the elementary distribution decays as a power-law (or as an exponential, 
which formally corresponds to 4 = oo), the distribution of the sum decays in 
the very same manner outside the central region, i.e. much more slowly than the 
Gaussian. The CLT simply ensures that these tail regions are expelled more and 


_ more towards large values of X when N grows, and their associated probability is 


smaller and smaller. When confronted with a concrete problem, one must decide 
whether N is large enough to be satisfied with a Gaussian description of the risks. In 
particular, if NV is less than the characteristic value N* defined above, the Gaussian 
approximation is very bad. 
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1.7 Correlations, dependence and non-stationary models (*) 


We have assumed up to now that the random variables were independent and iden- 
tically distributed. Although the general case cannot be discussed as thoroughly 
as the tid case, it is useful to illustrate how the CLT must be modified on a few 
examples, some of which are particularly relevant in the context of financial time 
series. 


1.7.1 Correlations 


Let us assume that the correlation function C;. j (defined as (x;x;) — m?)-of the 
random variables is non-zero for i # j. We also assume that the process is 
stationary, 1.€. that C;,; only depends on |i — j|: C;_; = C(\i—j|), with C(oo) = 0. 
The variance of the sum can be expressed in terms of the matrix C as:7° 


N N 
f= VG; = No? +2N y>(1 - z) Cre). (1.97) 
f=1 


j=l 


where o* = C(O). From this expression, it is readily seen that if C(£) decays 
faster than 1/é for large @, the sum over @ tends to a constant for large N, and 
thus the variance of the sum still grows as N, as for the usual CLT. If however 
C(£) decays for large £ as a power-law £~", with v < 1, then the variance 
grows faster than N, as N2~" —correlations thus enhance fluctuations. Hence, when 
v < 1, the standard CLT certainly has to be amended. The problem of the limit 
distribution in these cases is however not solved in general. For example, if the 
X; are correlated Gaussian variables, it is easy to show that the resulting sum is 
also Gaussian, whatever the value of v. Another solvable case is when the X; are 
correlated Gaussian variables, but one takes the sum of the squares of the X;’s. This 
sum converges towards a Gaussian of width /N whenever » > i, but towards a 
non-trivial limit distribution of a new kind (i.e. neither Gaussian nor Lévy stable) 
when v < }. In this last case, the proper rescaling factor must be chosen as N!~”. 

One can also construct anti-correlated random variables, the sum of which 
grows slower than JN. In the case of power-law correlated or anti-correlated 
Gaussian random variables, one speaks of ‘fractional Brownian motion’. This 
notion was introduced in [Mandelbrot and Van Ness]. 


1.7.2. Non-stationary models and dependence 


{t may happen that the distribution of the elementary random variables P,(x,), 
P3(x2)...., Py(%y) are not all identical. This is the case, for example, when the 


26 Pe 2 . : i. A . 
We again assume in the following, without loss of generality, that the mean m is zero. 
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variance of the random process depends upon time-~in financial markets, it is a 
well-known fact that the daily volatility is time dependent, taking rather high levels 
in periods of uncertainty, and reverting back to lower values in calmer periods. For 
example, the volatility of the bond market has been very high during 1994, and 
decreased in later years. Similarly, the volatility of stock markets has increased 
since August 1997. 

If the distribution P; varies sufficiently ‘slowly’, one can in principle measure 
some of its moments (for example its mean and variance) over a time scale which 
is long enough to allow for a precise determination of these moments, but short 
compared to the time scale over which Py is expected to vary. The situation is 
less clear if P, varies ‘rapidly’. Suppose for example that P;(x,) is a Gaussian 
distribution of variance o7, which is itself a random variable. We shall denote as 
(-:) the average over the random variable o;, to distinguish it from the notation 
(---), which we have used to describe the average over the probability distribution 
P,. If og varies rapidly, it is impossible to separate the two sources of uncertainty. 
Thus, the empirical histogram constructed from the series {x;, x2, ..., xv} leads to 
an ‘apparent’ distribution P which is non-Gaussian even if each individual Py is 
Gaussian. Indeed, from: 


= 1 : 
ie / Plo exn(- 53) do, (1.98) 


one can calculate the kurtosis of P as: 


Ty4) ae, 
Ge al & 1), (1.99) 
((x?))? (a7)? 


“Since for any random variable one has o4 > (o*)? (the equality being reached only 


if o2 does not fluctuate at all), one finds that % is always positive. The volatility 
fluctuations can thus lead to ‘fat tails’. More precisely, let us assume that the 
probability distribution of the RMS, P(c), decays itself for large o as exp(—o*), 
c > 0. Assuming P; to be Gaussian, it is easy to obtain, using a saddle-point 
method (cf. Eq. (1.75)}, that for large x one has: 


log[P(x)] « —x7. (1.100) 


Since c < 2 +c, this asymptotic decay is always much slower than in the Gaussian 
case, which corresponds to c — oo. The case where the volatility itself has a 
Gaussian tail (c = 2) leads to an exponential decay of P(x). 

Another interesting case is when o? is distributed as a completely asymmetric 
Lévy distribution (8 = 1) of exponent jz < 1. Using the properties of Lévy 
distributions, one can then show that P is itself a symmetric Lévy distribution 
(f = 0), of exponent equal to 2j:. 
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If the fluctuations of o, are themselves correlated, one observes an interesting 
case of dependence. For example, if o; is large, 0,4, will probably also be large. 
The fluctuation X, thus has a large probability to be large (but of arbitrary sign) 
twice in a row. We shall often refer, in the following, to a simple model where x, 
can be written as a product €,0;, where €, are iid random variables of zero mean 
and unit variance, and o, corresponds to the local ‘scale’ of the fluctuations, which 
can be correlated in time. The correlation function of the X, is thus given by: 


(x;x;) = G70; (e1€;) = 6;,;0°. (1.101) 


Hence the X, are uncorrelated random variables, but they are not independent since 
a higher-order correlation function reveals a richer structure. Let us for example 
consider the correlation of X?: 


(x?x?) — (x2)(x3) = 070? -o7o? §= GG # ), (1.102) 


which indeed has an interesting temporal behaviour: see Section 2.4.77 However, 


—2 
even if the correlation function a} 0; —o* decreases very slowly with |i — jl, 


one can show that the sum of the X,, obtained as 2 aw , €40% is still governed by 
the CLT, and converges for large N towards a Gaussian variable. A way to see this 
is to compute the average kurtosis of the sum, ky. As shown in Appendix A, one 
finds the following result: 


I a é 
Ky = W K+ 3+K0)80) +690 (1-5) 8 ‘ (1.103) 


é=1 


where Ko is the kurtosis of the variable €, and g(£) the correlation function of the 
variance, defined as: 


g70? 0? =o? gili — jl). (1.104) 


It is interesting to see that for N = 1, the above formula gives x; = Ko + (3 + 
Ko)g(O) > ko, Which means that even if ko = 0, a fluctuating volatility is enough to 
produce some kurtosis. More importantly, one sees that if the variance correlation 
function g(€) decays with @, the kurtosis ky tends to zero with N, thus showing: 
that the sum indeed converges towards a Gaussian variable. For example, if g(£) 
decays as a power-law €~" for large @, one finds that for large N: 


1 
Ky Oa for v>l; ky % a for v<l. (1.105) 


37 r . s : . : " os ‘5 . 
~" Note that fori # / this correlation function can be zero either because o is identically equal to a certain value 
og. or because the fluctuations of o are completely uncorrelated from one time to the next. 
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Hence, long-range correlation in the variance considerably slows down the conver- 
gence towards the Gaussian. This remark will be of importance in the following. 
since financial time series often reveal long-ranged volatility fluctuations. 


1.8 Central limit theorem for random matrices (*) 


One interesting application of the CLT concerns the spectral properties of ‘random 
matrices’. The theory of random matrices has made enormous progress during the 
past 30 years, with many applications in physical sciences and elsewhere. More 
recently, it has been suggested that random matrices might also play an important 
role in finance: an example is discussed in Section 2.7. It is therefore appropriate 
to give a cursory discussion of some salient properties of random matrices. The 
simplest ensemble of random matrices is one where all elements of the matrix H 
are iid random variables, with the only constraint that the matrix be symmetrical 
(Hj; = Hj;). One interesting result is that in the limit of very large matrices, the 
distribution of its eigenvalues has universal properties, which are to a large extent 
independent of the distribution of the elements of the matrix. This is actually 
the consequence of the CLT, as we will show below. Let us introduce first some 


notation. The matrix H is a square, M x M symmetric matrix. Its eigenvalues are 


he, witha = 1,...,M. The density of eigenvalues is defined as: 


1 
pla) = 2 5A — a) (1.106) 


where 6 is the Dirac function. We shall also need the so-called ‘resolvent’ G(A) of 
the matrix H, defined as: 


1 
ij 


where 1 is the identity matrix. The trace of G(A) can be expressed using the 
eigenvalues of H as: 


(1.108) 


_ The ‘trick’ that allows one to calculate (A) in the large M limit is the following 


representation of the 6 function: 


1 
x —i€ 


= PP + i 5(x) (e > 0), (1.109) 
x 


where P P means the principal part. Therefore, p(A) can be expressed as: 


=i as — ie€)). | fea BSE 
p{a) lim 7m 3 (Tr G(A — ie€)) (1.110) 


40) Probability theory: basic notions 


Our task is therefore to obtain an expression for the resolvent G(A). This can 
be done by establishing a recursion relation, allowing one to compute G(A) for 
a matrix H with one extra row and one extra column, the elements of which being 
Ho;. One then computes gar (2) (the superscript stands for the size of the matrix 
H) using the standard formula for matrix inversion: 


GM) _ minor(Al — Hoo 
det(Al ~ H) 


Now, one expands the determinant appearing in the denominator in minors along 
the first row, and then each minor is itself expanded in subminors along their first 
column. After a little thought, this finally leads to the following expression for 
Gin’ A: 


(1.111) 


1 


M 
GTA) — Hoo — x Hoi Hoj Gi (A). (1.112) 


i j=l 

This relation is general, without any assumption on the H;;. Now, we assume that 
the H;’s are iid random variables, of zero mean and variance equal to (HZ) = = 
a? /M. This scaling with M can be understood as follows: when the matrix H 
acts on a certain vector, each component of the image vector is a sum of M random 
variables. In order to keep the image vector (and thus the corresponding eigenvalue) 
finite when M — oo, one should scale the elements of the matrix with the factor 
1//M. 

One could also write a recursion relation for cr +! and establish self- 
consistently that G;; ~ 1/ VM fori j. On the other hand, due to the diagonal 
term A. G,; remains finite for M —> ov. This scaling allows us to discard all 
the terms with i # j in the sum appearing in the right-hand side of Eq. (1.112). 
Furthermore, since Hog ~ 1/./M, this term can be neglected compared to 4. This 
finally leads to a simplified recursion relation, valid in the limit M — oe: 


=i- Dio M(x). (1.113) 


(Ss 


HG 
Now. UBIie - CLT, we know that the last sum converges, for large M, towards 
o7/M pod , Gi (A). This result is independent of the precise statistics of the Hp;, 
provided their variance is finite.2* This shows that Goo converges for large M 
towards a well-defined limit G,,, which obeys the following limit equation: 
1 
Goo(A) 


=A —o’Gaa(d). (1.114) 


28 ‘The case of Lévy distributed Aj;’s with infinite variance has been investigated in: P. Cizeau, J.-P. Bouchaud. 
Theory of Lévy matrices, Physical Review, E 50, 1810 (1994). 
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The solution to this second-order equation reads: 


| {y> 4 
Sid =a [2 - Vi? = 40°]. (1.115) 


(The correct solution is chosen to recover the right limit for o = 0.) Now, the only 
way for this quantity to have a non-zero imaginary part when one adds to A a small 
imaginary term ie which tends to zero is that the square root itself is imaginary. 
The fina} result for the density of eigenvalues is therefore: 


nope | Sa  eonlal wee (1.116) 
2no2 


and zero elsewhere. This is the well-known ‘semi-circle’ law for the density 
of states, first derived by Wigner. This result can be obtained by a variety of 
other methods if the distribution of matrix elements is Gaussian. In finance, one 
often encounters correlation matrices C, which have the special property of being 
positive definite. C can be written as C = HH", where H! is the matrix transpose of 
H. In general, H is a rectangular matrix of size M x N,so Cis M x M. In Chapter 
2, M will be the number of assets, and N, the number of observations (days). In 
the particular case where N = M, the eigenvalues of C are simply obtained from 
those of H by squaring them: 

Ac = Ay (1.117) 


If one assumes that the elements of H are random variables, the density of 
eigenyalues of C can be obtained from: 


pac) dace = 2p(An) dAq for Aq > 0. (1.118) 


where the factor of 2 comes from the two solutions Ay = +./Ac¢; this then leads 


to: 
l4g2—A 
ae ee for 0 < Ac < 40’, (1.119) 
c14 


2 o*\ Ac 


and zero elsewhere. For N # M, a similar formula exists, which we shall use in 
the following. In the limit VN. M — oo, with a fixed ratio Q = N/M > 1, one 
has:7? 


Q (Amax me he)e icz Amin) 


AE) = Sat ie 
amex = o7%(1+1/0+271/9), (1.120) 


with 2 € [Amine Amax} and where o7/N is the variance of the elements of H. 


29 A derivation of Eq. (1.120) is given in Appendix B. See also: A. Edelmann. Eigenvalues and condition 
numbers of random matrices. SIAM Journal of Matrix Analysis and Applications, 9, 543 (1988). 
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Fig. 1.10. Graph of Eq. (1.120) for @ = 1,2 and 5. 


equivalently o* is the average eigenvalue of C. This form is actually also valid 
for QO < |, except that there appears a finite fraction of strictly zero eigenvalues, 
of weight 1 — Q (Fig. 1.10). 

The most important features predicted by Eq. (1.120) are: 


The fact that the lower ‘edge’ of the spectrum is positive (except for Q = 
there is therefore no eigenvalue between 0 and Amin. Near this edge, the density 
of eigenvalues exhibits a sharp maximum, except in the limit Q = 1 (Amin = 0) 
where it diverges as ~ 1/¥/A. 


The density of eigenvalues also vanishes above a certain upper edge Ajax. 


Note that all the above results are only valid in the limit N — oo, For finite N, 
the singularities present at both edges are smoothed: the edges become somewhat 
blurred, with a small probability of finding eigenvalues above Amax and below Ain, 
which goes to zero when N becomes large.*? 

In Chapter 2, we will compare the empirical distribution of the eigenvalues of the 
correlation matrix of stocks corresponding to different markets with the theoretical 
prediction given by Eq. (1.120). 


30 See e.g, M. J. Bowick, E. Brézin, Universal scaling of the tails of the density of eigenvalues in random matrix 
models, Physics Letrers, B268, 2} (1991). 
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1.9 Appendix A: non-stationarity and anomalous kurtosis 


-In this appendix, we calculate the average kurtosis of the sum om dx;, assuming 


that the 5.x; can be written as o;¢€;. The o,’s are correlated as: 


(De- D)(De-D) =D gil€—kl) Dy x2. (1.121) 


; 4 
- Let us first compute (oh. = i where {---) means an average over the €,’s 
5. rf = 


and the overline means an average over the o;'s. If (e;) = 0, and (e;€;) = 0 for 
i # j, one finds: 


N 
(( >, aa ya ,+3 iw (8x2) (8x2) )(8x7) 


ijkI=1 i¢j=l 
= 3-+40) 3 Ga 43 3 (8x2) (5x2) ) (x2), (1.122) 


i=l ixzj=) 


where we have used the definition of xg (the kurtosis of €). On the other hand, one 


must estimate (xt | Ox, xi) ). One finds: 


N 
((s:s.)) - oy ) (8x2). (1.123) 
i=1 


ij=) 
Gathering the different terms and using the definition Eq. (1.121), one finally 
establishes the following general relation: 


l =~? —2 {2 7 
kw = — | ND (3+40)(1+g0))-3ND +3D }° ati 10] 
N?D i#j=! 
(1.124) 


I “ oe ” 
a Bae c + (3 + ko)g(0) +8 (: ai =) «| . (1.125) 


or: 


1.10 Appendix B: density of eigenvalues for random correlation matrices 


This very technical appendix aims at giving a few of the steps of the computation 
needed to establish Eq. (1.120). One starts from the following representation of the 
resolvent G(A): 
I 
Xd ———t 
G@)= ze a 


a a 
= = —Z(A). 
axel] ha) = sy log det(Al — C) rel ) 
(1.126) 
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Using the following representation for the determinant of a symmetrical matrix A: 


- 1 \“ 1 M 
[detA] | = (=) [o -= Yip; Ai; dg;. (LA27) 
vor 2 x ae I] 


we find, in the case where C = HH’: 


M ON 
ZA) = —210g f exp 38-3 ie 5 De dvi matte |). 


i j=l k= 

(1.128) 
The claim is that the quantity G(A) is self-averaging in the large M limit. So in 
order to perform the computation we can replace G(A) for a specific realization of 
H by the average over an ensemble of H. In fact, one can in the present case show 
that the average of the logarithm is equal, in the large N limit, to the logarithm of 
the average. This is not true in general, and one has to use the so-called ‘replica 
trick’ to deal with this problem: this amounts to taking the nth power of the quantity 
to be averaged (corresponding to ” copies (replicas) of the system) and to let » go 
to zero at the end of the computation.*! 

We assume that the M x N variables Hj, are tid Gaussian variable with zero 
mean and variance o*/N. To perform the average of the Fix, we notice that the 
integration measure generates a term exp —M }° HA 2 /(20°) that combines with 
the Hj,Hj, term above. The summation over the index k doesn’t play any role 
and we get NV copies of the result with the index k dropped. The Gaussian aufeael 
over Hj, gives the square-root of the ratio of the determinant of [M4j,;/o 2) and 
[M6j;/07 — gig;): 


2 -N/2 
laof-1E Somme] (1B EW)". ca 


i,j=1 k=1 


We then introduce a variable g = 07? )° y?/N which we fix using an integral 
representation of the delta function: 


5 (4 - oS yf/N) os / — exp lis@ ~¢? Yn] dt. (1.130) 


After performing the integral over the g;’s and writing z = 2i¢/N, we find: 
Za) = -2 log = xs exp] —¥ (loge. - o7z) + 
-ico 


preuaes dq dz, (1.131) 


3! For more details on this technique, see. for example. M, Mézard. G. Parisi, M. A. Virasoro, Spin Glasses and 
Beyond, World Scientific, Singapore, 1987. 


LUt References 45 


where Q = N/M. The integrals over = and q are performed by the saddle point 
method, leading to the following equations: 


Qq = = and c=——. (1.132) 


The solution in terms of g(A) 1s: 


(oF — 0) + QA + V(o(1 — Q) + QA)? — 407A 
QV) (1.133) 
20h 
We find G(A) by differentiating Eq. (1.131) with respect to 4. The computation is 
greatly simplified if we notice that at the saddle‘ point the partial derivatives with 


respect to the functions g(A) and z(A) are zero by construction. One finally finds: 


M _ MQq) 
A-o2z(A). ot 


G(A) = (1.134) 


We can now use Eq. (1.110) and take the imaginary part of G(A) to find the density 
of eigenvalues: 


(40702 — (621 — Q) + On 


1.135 
2nic2 ( ) 


pia) = 


which is identical to Eq. (1.120). 
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Statistics of real prices 


Le marché, a son insu, obéit a une loi qui le domine : la loi de la probabilité.! 


(Bachelier, Théorie de la spéculation.) 


2.1 Aim of the chapter 


The easy access to enormous financial databases, containing thousands of asset 
time series, sampled at a frequency of minutes or sometimes seconds, allows one 
to investigate in detail the statistical features of the time evolution of financial 
assets. The description of any kind of data, be it of physical, biological, or financial 
origin, requires however an interpretation framework, needed to order and give 
a meaning to the observations. To describe necessarily means to simplify, and 
even sometimes betray: the aim of any empirical science is to approach reality 
progressively, through successive improved approximations. 

The goal of the present chapter is to present in this spirit the statistical properties 
of financial time series. We shall propose some plausible mathematical modelling, 
as faithful as possible (though imperfect) of the observed properties of these 
time series. The models we discuss are however not the only possible models; 
the available data is often not sufficiently accurate to distinguish, say, between a 
truncated Lévy distribution and a Student distribution. The choice between the two 
is then guided by mathematical convenience. In this respect, it is interesting to note 
that the word ‘modelling’ has two rather different meanings within the scientific 
community. The first one, often used in applied mathematics, engineering sciences 
and financial mathematics, means that one represents reality using appropriate 
mathematical formulae. This is the scope of the present chapter. The second, 
more common in the physical sciences, is perhaps more ambitious: it aims at 
finding a set of plausible causes sufficient to explain the observed phenomena, 


! ‘The market, without knowing it, obeys a law which overwhelms it: the law of probability. 
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and therefore, ultimately, to justify the chosen mathematical description. We will 


however only discuss in a cursory way the ‘microscopic’ mechanisms of price 
formation and evolution, of adaptive traders’ strategies, herding behaviour between 


traders, feedback of price variations onto themselves, etc., which are certainly at 
the origin of the interesting statistics that we shall report below. We feel that this 


aspect of the problem is still in its infancy, and will evolve rapidly in the coming 
years. We briefly mention, at the end of this chapter, two simple models of herding 


and feedback, and give references of several very recent articles. 


We shall describe several types of market: 


e Very liquid, ‘mature’ markets of which we take three examples: a US stock index 
(S&P 500), an exchange rate (DEM/$), and a long-term interest rate index (the 


German Bund): 


e Very volatile markets, such as emerging markets like the Mexican peso; 


e Volatility markets: through option markets, the volatility of an asset (which is 
empirically found to be time dependent) can be seen as a price which is quoted 


on markets (see Chapter 4); 


e Interest rate markets, which give fluctuating prices to loans of different maturi- 


ties, between which special types of correlations must however exist. 


We chose to limit our study to fluctuations taking place on rather short time 
scales (typically from minutes to months). For longer time scales, the available 


data-set is in general too small to be meaningful. From a fundamental point of 


view, the influence of the average return is negligible for short time scales, but 
becomes crucial on long time scales. Typically, a stock varies by several per cent 
within a day, but its average return is, say, 10% per year, or 0.04% per day. Now, 
the ‘average return’ of a financial asset appears to be unstable in time: the past 
return of a stock is seldom a good indicator of future returns. Financial time series 
are intrinsically non-stationary: new financial products appear and influence the 
markets, trading techniques evolve with time, as does the number of participants 
and their access to the markets, etc. This means that taking very long historical 
data-set to describe the long-term statistics of markets is a priori not justified. We 


will thus avoid this difficult (albeit important) subject of long time scales. 
The simplified model that we will present in this chapter, and that will be-the« 


starting point of the theory of portfolios and options discussed in later chapters, 
can be summarized as follows. The variation of price of the asset X between time 


t = Oand? = T can be decomposed as: 


N-1 
x(T) = xo + Yo oxy 
k=0 


where, 


(2.1) 


2.1 Ain of the chapter 49 


e /n a first approximation, and for T not too large, the price increments 6.x, are 


random variables which are (i) independent as soon as t is larger than a few 
tens of minutes (on liquid markets) and (ii) identically distributed, according to 
a TLD, Eg. (1.23), Pi (dx) = L‘) (6x) with a parameter 4 approximately equal 
to 3, for all markets.? The exponential cut-off appears ‘earlier’ in the tail for 
liquid markets, and can be completely absent in less mature markets. 

The results of Chapter | concerning sums of random variables, and the 
convergence towards the Gaussian distribution, allows one to understand the 
observed ‘narrowing’ of the tails as the time interval 7 increases. 


e A refined analysis however reveals important systematic deviations from this 


simple model. In particular, the kurtosis of the distribution of x(7)—xo decreases 
more slowly than 1/N, as it should if the increments 6x, were iid random 
variables. This suggests a certain form of temporal dependence, of the type 
discussed in Section 1.7.2. The volatility (or the variance) of the price increments 
5x is actually itself time dependent: this is the so-called ‘heteroskedasticity’ 
phenomenon. As we shall see below, periods of high volatility tend to persist 
over time, thereby creating long-range higher-order correlations in the price 


' increments. On long time scales, one also observes a systematic dependence 


of the variance of the price increments on the price x itself. In the case 
where the RMS of the variables 6x grows linearly with x, the model becomes 
multiplicative, in the sense that one can write: 


N-1 T 
x(T)=xo[[G+m) N=-, (2.2) 
k=0 


where the returns n, have a fixed variance. This model is actually more 
commonly used in the financial literature. We will show that reality must 
be described by an intermediate model, which interpolates between a purely 
additive model, Eq. (2.1), and a multiplicative model, Eq. (2.2). 


Studied assets 


The chosen stock index is the futures contract on the Standard and Poor's 500 (S&P 500) 
US stock index, traded on the Chicago Mercantile Exchange (CME). During the tine 
period chosen (from November 1991 to February 1995), the index rose from 375 to 480 
points (Fig. 2.] (top)). Qualitatively, all the conclusions reached on this period of time are 
more generally valid, although the value of some parameters (such as the volatility) can 
change significantly from one period to the next. 


The exchange rate is the US dollar ($) against the German mark (DEM), which is the most 


active exchange rate market in the world. During the analysed period, the mark varied 


2 Alternatively, a description in terms of Student distributions is often found to be of comparable quality, with 
atai] exponent x ~ 3-5 for the S&P 500. for example. 
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Fig. 2.1, Charts of the studied assets between November 1991 and February 1995. The top 
chart is the S&P 500. the middle one is the DEM/S, and the bottom one is the long-term 
German interest rate (Bund). 


az e 


between 58 and 75 cents (Fig. 2.1 (middle)). Since the interbank settlement prices are not 
available, we have defined the price as the average between the bid and the ask prices. 


Finally, the chosen interest rate index is the futures contract on long-term German bonds 
(Bund), quoted on the London International Financial Futures and Options Exchange 
(LIFFE). It is typically varying between 85 and 100 points (Fig. 2.1 (bottom)). 


* There is, on all financial markets, a difference between the bid price and the ask price for a certain asset at a 
given instant of time. The difference between the two is called the “bid/ask spread’. The more liquid a market, 
the smaller the average spread. 


2.2 Second-order statistics Sl 


The indices S&P 500 and Bund that we have studied are thus actually futures contracts 
(ef. Section 4.2). The fluctuations of futures prices follow in general those of the underlying 
contract and it is reasonable to identify the statistical properties of these two objects. 
Futures contracts exist with several fixed maturity dates. We have always chosen the most 
liquid maturity and suppressed the artificial difference of prices when one changes from 
one maturity to the next (roil). We have also neglected the weak dependence of the futures 
contracts on the short time interest rate (see Section 4.2): this trend is completely masked 
by the fluctuations of the underlying contract itself. 


2.2 Second-order statistics 
2.2.1 Variance, volatility and the additive-multiplicative crossover 


In all that follows, the notation 6x represents the difference of value of the asset X 
between two instants separated by a time interval T: 


bx, = x(t +7) — x(t) t=kr. (2.3) 


In the whole modern financial literature, it is postulated that the relevant variable 


_ is not the increment 6x itself, but rather the return n = 5x/x. It is therefore 


interesting to study empirically the variance of 5x, conditioned to a certain value 
of the price x itself, which we shall denote (5x7)|,. If the return 7 is the natural 
random variable, one should observe that \/(Sx2)|, = o,x, where o; 18 constant 
(and equal to the RMS of 7). Now, in many instances (Figs 2.2 and 2.4), one 
rather finds that \/(5x7)|, is independent of x, apart from the case of exchange 
rates between comparable currencies. The case of the CAC 40 is particularly 
interesting, since during the period 1991-95, the index went from 1400 to 2100, 


~ leaving the absolute volatility nearly constant (if anything, it is seen to decrease 


with x!). ‘ 

On longer time scales, however, or when the price x rises substantially, the RMS 
of 6x increases significantly, as to become proportional to x (Fig. 2.3). A way to 
model this crossover from an additive to a multiplicative behaviour is to postulate 
that the RMS of the increments progressively (over a time scale T,) adapt to the 
changes of price of x. Schematically, for 7 < T,, the prices behave additively, 


. whereas for T > 7,, multiplicative effects start playing a significant role:* 


(x(T)—x0)?) = DT (1 «T,): 
(ioe’ (=) ar Sey (2.4) 
0 


4 In the additive regime, where the variance of the increments can be taken as a constant. we shal! write {8x2} = 
ofxa = Dr. 
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Fig. 2.2. RMS of the increments 6x, conditioned to a certain value of the price x, as a 
function of x, for the three chosen assets. For the chosen period, only the exchange rate 
DEM/$ conforms to the idea of a multiplicative model: the straight line corresponds to the 


best fit (Sx?) uy 2 = o,x. The adequacy of the multiplicative model in this case is related to 
the symmetry $/DEM -> DEM/$. 


Py 


On liquid markets, this time scale is on the order of months. A convenient way 
to model this crossover is to introduce an additive random variable (7), and to 
represent the price x(T) as x(T) = xo(1 + E(T)/q(T))@™. For T < 7,,q > 1, 
the price process is additive, whereas for T >> T,,q — ©, which corresponds to 
the multiplicative limit. 
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Fig. 2.3. RMS of the increments 5x, conditioned to a certain value of the price x, as a 
function of x, for the S&P 500 for the 1985-98 time period. 
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Fig. 2.4. RMS of the increments 5x, conditioned to a certain value of the price x, as a 
function of x, for the CAC 40 index for the 1991-95 period; it is quite clear that during 
that time period (5x2)|, was almost independent of x. 


2.2.2 Autocorrelation and power spectrum 


The simplest quantity, commonly used to measure the correlations between price 
increments, is the temporal two-point correlation function Cj,, defined as:° 


Tt I 2 
Cre = Dz oxebxeds Dt= (5x;). (2.5) 


s . , 
~ In principle, one should subtract the average value (6x) = mt = my, from 6x. However, if t is small (for 
example equal to a day). mr is completely negligible compared to y Dr. 
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Fig. 2.5. Normalized correlation function Cj, for the 
of the time difference |k — I|r, and for r = 5.min. Up to 30 min, some weak but 
significant correlations do exist (of amplitude ~ 0.05). Beyond 30 min, however, the 
two-point correlations are not statistically significant. 


Figure 2.5 shows this correlation function for the three chosen assets, arid for t = 
5 min. For uncorrelated increments, the correlation function C{, should be equal 
to zero fork # 1, with an RMS equal to o = 1/ JN, where N is the number of 
independent points used in the computation. Figure 2.5 also shows the 30 error 
bars. We conclude that beyond 30 min, the two-point correlation function cannot 
be distinguished from zero. On less liquid markets, however, this correlation time is 
longer. On the US stock market, for example, this correlation time has significantly 
decreased between the 1960s and the 1990s. 
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Fig. 2.6. Normalized correlation function C je for the three chosen assets, as a function of 
the time difference |k — //z. now ona daily basis, r = 1 day. The two horizontal lines at 
0.1 correspond to a 3o error bar. No significant correlations can be measured, 


On very short time scales, however, weak but significant correlations do exist, 
These correlations are however too small to allow profit making: the potential 
return is smaller than the transaction costs involved for such a high-frequency 
trading strategy, even for the operators having direct access to the markets (cf. 
Section 4.1.2). Conversely, if the transaction costs are high, one may expect 
significant correlations to exist on longer time scales. 

We have performed the same analysis for the daily increments of the three 
chosen assets (tr = | day). Figure 2.6 reveals that the correlation function is 
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Fig. 2.7, Power spectrum S(w) of the time series DEM/$, as a function of the frequency 
w. The spectrum is flat: for this reason one often speaks of white noise, where all 
the frequencies are represented with equal weights. This corresponds to uncorrelated 
increments. 


always within 30 of zero, confirming that the daily increments are not significantly 
correlated. 


Power spectrum 


Let us briefly mention another equivalent way of presenting the same results, using the 
so-called power spectrum, defined as: 


S(@) = +(> i. Sxpdxeele ). (2.6) 


keel 


The case of uncorrelated increments leads to a flat power spectrum, S(@) = So. Figure 
2.7 shows the power spectrum of the DEM/$ time series, where no significant structure 
appears. 


2.3 Temporal evolution of fluctuations - 4 
2.3.1 Temporal evolution of probability distributions 


The results of the previous section are compatible with the simplest scenario 


where the price increments 5x, are, beyond a certain correlation time, independent . 


random variables. A much finer test of this assumption consists in apes directly 
the probability distributions of the price increments x, — x9 = => dx, on 
different time scales N = T/t. If the increments are independent, then the 
distributions on different time scales can be obtained from the one pertaining to 


2.3 Temporal evolution of fluctuations Ss? 


Table 2.1. Value of the parameters A and a@~', as obtained by fitting the data 
with a symmetric TLD L‘, of index ye = 3. Note that both A and @~! have the 
dimension of a price variation 5x,, and therefore directly characterize the nature of 
the statistical fluctuations. The other columns compare the RMS and the kurtosis of 
the fluctuations, as directly measured on the data, or via the formulae, Egs (1.94), 
(1.95). Note that in the case DEM/$, the studied variable is 1008x /x. In this last 
case, the fit with 4 = 1.5 is not very good: the calculated kurtosis is found to be 
too high. A better fit is obtained with . = 1.2 


——— ee eS 
EEE 


Asset Variance oa? Kurtosis «; 

A w~! Measured Computed Measured Computed 
S&P 500 0.22 2.21 0.280 0.279 12.7 13.1 
Bund 0.0091 0.275 0.00240 0.002 42 20.4 23.5 
DEM/$ 0.0447 0.96 0.0163 0.0164 20.5 41.9 
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the elementary time scale t (chosen to be larger than the correlation time). More 
precisely (see Section 1.5.1), one should have P(x, N) = [P,(6x;)]*”. 


The elementary distribution P, 


The elementary cumulative probability distribution P;. (5x) is represented in 
Figures 2.8, 2.9 and 2.10. One should notice that the tail of the distribution is broad, 
in any case — broader than a Gaussian. A fit using a truncated Lévy distribution 
of index xp = 5, as given by Eq. (1.23), is quite SauisTyANg: ® The corresponding 
parameters A and q@ are given in Table 2.1 (For jz = 3, the relation between A and 
3/3 reads: a3. = 2/2 A¥/?/3.) Alternatively, as shown in Figure 1.5, a fit using 
a Student distribution would also be ani, 

We have chosen to fix the value of jz to 3 . This reduces the number of adjustable 
parameters, and is guided by the following. observations: 


e A large number of empirical studies on the use of Lévy distributions to fit the 
financial market fluctuations report values of 4 in the range 1.6-1.8. However. 
in the absence of truncation (i.e. with @ = 0), the fit overestimates the tails of the 
distribution. Choosing a higher value of jz partly corrects for this effect. since it 
leads to a thinner tail. 

e If the exponent j is left as a free parameter, it is in many cases found to be 
in the range 1.4-1.6, although sometimes smaller, as in the case of the DEM/S 
(uc 1.2); 


6 A more refined study of the tails actually reveals the existence of a small asymmetry, which we neglect here. 
Therefore, the skewness 3 is taken to be zero. 
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Fig. 2.8. Elementary cumulative distribution P;, (4x) (for dx > 0) and P -(4x) (for dx < 
0), for the S&P 500, with t = 15 min. The thick line corresponds to the best fit using a 


symmetric TLD Be of index ps = 3. We have also shown on the same graph the values 
of the parameters A and a! as obtained by the fit. 


e The particular value uw = 3 has a simple theoretical interpretation, which we 
shall briefly present in Section 2.8. s @ 


In order to characterize a probability distribution using empirical data, it is always better 
to work with the cumulative distribution function rather than with the distribution density. 
To obtain the latter, one indeed has to choose a certain width for the bins in order to 
construct frequency histograms, or to smooth the data using, for example, a Gaussian with 
a certain width. Even when this width is carefully chosen, part of the information is lost. 
It is furthermore difficult to characterize the tails of the distribution, corresponding to rare 
events, since most bins in this region are empty. On the other hand, the construction of the 
cumulative distribution does not require to choose a bin width. The trick is to order the 
observed data according to their rank, for example in decreasing order. The value x, of the 
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Fig. 2.9. Elementary cumulative distribution for the DEM/$, for r = 15 min, and best fit 


using a symmetric TLD EY . of index up = 3, In this case, it is rather 1005x/x that has 
been considered. The fit is not very good, and would have been better with a smaller value 
of 4 ~ 1.2. This increases the weight of very small variations. 


kth variable (out of N) is then such that: 


k 

Pz (xx) Woe (2.7) 
This result comes from the following observation: if one draws an (N + 1)th random 
variable from the same distribution, there is an a priori equal probability 1fN + 1 that 
it falis within any of the N +1 intervats defined by the previously drawn variables. The 
probability that it falls above the kth one, x, is therefore equal to the number of intervals 
beyond xy, which is equal to k, times 1/N +1. This is also equal, by definition, to P. (xx). 
(See also the discussion in Section 1.4, and Eq. (1.45)}). Since the rare events part of the 
distribution is a particular interest, it is convenient to choose a logarithmic scale for the 
probabilities. Furthermore, in order to check visually the symmetry of the probability 
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Fig. 2.10. Elementary cumulative distribution for the Bund. for r = 15 min, and best fit 
using a symmetric TLD LS. of index yz = 3. 


distributions, we have systematically used P.(—6x) for the negative increments, and 
P..(6x) for positive 5x. : 4s 


Maximum likelihood 


Suppose that one observes a series of N realizations of the random iid variable X, 
{x1,.X2,...,%n}, drawn with an unknown distribution that one would like to parameterize, 
for simplicity, by a single parameter js. If P,(x) denotes the corresponding probability 
distribution, the a priori probability to observe the particular series {x\,X2,...,Xw} is 
Proportional to: 


Py (X) Pu (x2)... Puen). (2.8) 
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The most likely value j0* of is such that this a priori probability is maximized. Taking for 
example P,,(x) to be a power-law distribution: 


u 
LX, 
Py, (j= an x ay (2,9) 
(with xq known), one has: 
Py (x1) Py (x2)... Pu (xy) o eM SHEN H log xo~(1+n) D2) logs (2.10) 


The equation fixing j1* is thus, in this case: 


N a N 
—+Nlogxo—) logx; =0 3 n* = 
“+ Nlogxo~ ) logs; =0 = 


—_---—————.. (2.11) 
— N | log(xi/xo) 


This method can be generalized to several parameters. In the above example, if xo is 
unknown, its most likely value is simply given by: xo = min{x1, X2,..., XN}. 


Convolutions 


- The parameterization of P| (5x) as a TLD allows one to reconstruct the distribution 


of price increments for all time intervals T = Nt, if one assumes that the 
increments are iid random variables. As discussed in Chapter 1, one then has 
P(éx, N) = [P,(6x,)]"". Figure 2.11 shows the cumulative distribution for 7 = | 
hour, | day and 5 days, reconstructed from the one at 15 min, according to 
the simple iid hypothesis. The symbols show empirical data corresponding to 
the same time intervals. The agreement is acceptable; one notices in particular 
the progressive deformation of P(éx, N) towards a Gaussian for large N. The 


. evolution of the variance and of the kurtosis as a function of N is given in Table 


2.2, and compared with the results that one would observe if the simple convolution 
rule was obeyed, i.e. 0%, = No? and ky = «;/N. For these liquid assets, the time 
scale T* = x,t which sets the convergence towards the Gaussian is on the order 
of days. However, it is clear from Table 2.2 that this convergence is slower than it 
ought to be: ky decreases much more slowly than the 1/N behaviour predicted by 
an lid hypothesis. A closer look at Figure 2.11 also reveals systematic deviations: 


for example the tails at 5 days are distinctively fatter than they should be. 


Tails, what tails? 


The asymptotic tails of the distributions P(éx, N) are approximately exponential 
for all N. This is particularly clear for T = Nt = | day, as illustrated in 
Figure 2.12 in a semi-logarithmic plot. However, as mentioned in Section 1.3.4 and 
in the above paragraph, the distribution of price changes can also be satisfactorily 
fitted using Student distributions (which have power-law tails) with rather high 
exponents. In some cases, for example the distribution of losses of the S&P 
500 (Fig. 2.12), one sees a slight upward bend in the plot of P.(x) versus x 
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Fig. 2.11. Evolution of the distribution of the price increments of the S&P 500, P(éx, N) 
(symbols). compared with the result obtained by a simple convolution of the elementary 
distribution P;(6x;) (dark lines). The width and the general shape of the curves are rather 
well reproduced within this simple convolution rule. However, systematic deviations can 
be observed, in particular for large |6x|. This is also reflected by the fact that the kurtosis 
«yw decreases more slowly than «;/N. cf. Table 2.2. 


in a linear-log plot. This indeed suggests that the decay could be slower than 
exponential. Many authors have proposed that the tails of the distribution of price 
changes is a stretched exponential exp(—|éx|°) with c < 1,’ or even a power-law 
with an exponent jz in the range 3-5.* For example, the most likely value of jz 


7 See: J. Lahervére, D, Sornette. Stretched exponential distributions in nature and in economy, European Journal 
of Physics, B 2. 525 (1998). 

8 See eg. M. M, Dacorogna, U. A. Muller, O. V. Pictet, C. G. de Vries, The distribution of extremal 
exchange rate returns in extremely large data sets, Olsen and Associate working paper (1995), available 
at http://www.olsen.ch: F. Longin, The asymptotic distribution of extreme stock market returns, Journal of 
Business 69, 383 (1996), P. Gopikrishnan, M. Meyer, L. A. Amaral, H. E. Stanley, Inverse cubic law for the 
distribution of stock price variations, European Journal of Physics, B 3, 139 (1998). 
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Table 2.2. Variance and kurtosis of the distributions P(dx.NM) measured or 
computed from the variance and kurtosis at time scale t by assuming a simple 
convolution rule, leading to ox, = No} and ky = «/N. The kurtosis at scale N 
is systematically too large, cf. Section 2.4. We have used N = 4 for T = | hour, 
N = 28 for T = | day and N = 140 for T = 5 days 


‘Asset: 5 = Variance of, Kurtosis ky 
Measured Computed Measured Computed 

S&P 500 (T = 1h) 1.06 1.12 6.65 3.18 
Bund (7 = 1h) 9.49x 1073 9.68 x 1073 10.9 5.88 
DEM/$ (T = 1h) 6.03 x 10-2 6.56 x 107? 7.20 5.11 
S&P 500 (T = 1 day) 7.97 7.84 1.79 0.45 
Bund (T = 1 day) 6.80 x 10>? 6.76 x 107? 4.24 0.84 
DEM/$ (T = 1 day) 0.477 0.459 1.68 0.73 
S&P 500 (T = 5 days) 38.6 39.20 1.85 0.09 
Bund (T = 5 days) 0.341 0.338 1.72 0.17 
DEM/$ (T = 5 days) 2.52 2.30 0.91 0.15 


using a Student distribution to fit the daily variations of the S&P in the period 
1991-95 is 4 = 5. Even if it is rather hard to distinguish empirically between an 
exponential and a high power-law, this question is very important theoretically. In 
particular, the existence of a finite kurtosis requires jz to be larger than 4. As far 
as applications to risk control, for example, are concerned, the difference between 
the extrapolated values of the risk using an exponential or a high power-law fit of 
the tails of the distribution is significant, but not dramatic. For example, fitting the 
tail of an exponential distribution by a power-law, using 1000 days. leads to an 
effective exponent jz ~ 4. An extrapolation to the most probable drop in 10000 
days overestimates the true figure by a factor 1.3. In any case, the amplitude of 
very large crashes observed in the century are beyond any reasonable extrapolation 
of the tails, whether one uses an exponential or a high power-law. The a priori 
probability of observing a 22% drop in a single day, as happened on the New York 
Stock Exchange in October 1987, is found in any case to be much smaller than 10~* 
per day, that is, once every 40 years. This suggests that major crashes are governed 
by a specific amplification mechanism, which drives these events outside the scope 
of a purely statistical analysis, and require a specific theoretical description.” 


9 On this point, see A. Johansen, D. Sornette. Stock market crashes are outliers, Ewropean Journal of Physics. 
B 1, 141 (1998), and J.-P. Bouchaud, R. Cont. European Journal of Physics, B 6, 543 (1998). 
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Fig. 2.12. Cumulative distribution of the price increments (positive and negative), on the 
scale of N = 1 day, for the three studied assets, and in a linear—log representation. One 
clearly see the approximate exponential nature of the tails, which are straight lines in this 
representation. 


2.3.2 Multiscaling — Hurst exponent (*) 


The fact that the autocorrelation function is zero beyond a certain time scate 
t implies that the quantity ([xy — xo]*) grows as DtN. However, measuring 
the temporal fluctuations using solely this quantity can be misleading. Take for 
example the case where the price increments 5x, are independent, but distributed 
according to a TLD of index jz < 2. As we have explained in Section 1.6.5, 
the sum xy — X = ir 6x, behaves, as long as N «< N* = k, asa 
pure ‘Lévy’ sum, for which the truncation is inessential. Its order of magnitude 


is therefore xy — xg ~ AN'/#, where A“ is the tail parameter of the Lévy 


2.3 Teniporal evolution of fluctuations 05 


distribution. However, the second moment ([xy — xy]") = DN, is dominated by 
extreme fluctuations, and is therefore governed by the existence of the exponential 
truncation which gives a finite value to Dt proportional to A“@"~*. One can check 
that as long as N < N*, one has /DtN >> AN'/', This means that in this case, 
the second moment overestimates the amplitude of probable fluctuations. One can 
generalize the above result to the gth moment of the price difference, ([x, — xo]*}. 
If g > y, one finds that all moments grow like N in the regime N < N*, and 
like N4/# if qg < js. This is to be contrasted with the sum of Gaussian variables. 
where ([xy — xXo]?) grows as N%/? for all g > 0. More generally, one can define 
an exponent fy as ([xv — Xo]?) « N ‘v If ¢,/q is not constant with q, one speaks 
of multiscaling. It is not always easy to distinguish true multiscaling from apparent 
multiscaling, induced by crossover or finite size effects. For example, in the case 
where one sums uncorrelated random variables with a long-range correlation in the 
variance, one finds that the kurtosis decays slowly, as ky « N~", where v < | is 
the exponent governing the decay of the correlations. This means that the fourth 
moment of the difference x, — Xp behaves as: 


(Ixy — xol*) = (DtN)*[3 + ew] ~ N24 .N7°°. (2.12) 


If v is small, one can fit the above expression, over a finite range of N, using an 
effective exponent ¢, < 2, suggesting multiscaling. Similarly, higher moments can 
be accurately fitted using an effective exponent ¢, < q/2.'° This is certainly a 
possibility that one should keep in mind, in particular when analysing financial 
time series (see Mandelbrot. 1998). 

Another interesting way to characterize the temporal development of the fluctu- 
ations is to study, as suggested by Hurst, the average distance between the ‘high’ 
and the ‘low’ in a window of size t = nt: 


Hn) = (max (xg) eaepi ern — MANX, ) caey1 een )e- (2.13) 


The Hurst exponent H is defined from H(n) « nn", In the case where the 
increments 8x, are Gaussian, one finds that H = A (for large n). In the case of 
a TLD of index 1 < & < 2, one finds: 


Hin) x Ani (n < N*) Hin) x«V¥Dtn (n> N*). (2.14) 


The Hurst exponent therefore evolves from an anomalously high value H = 1//4 
to the ‘normal’ value H = } as n increases. Figure 2.13 shows the Hurst function 
H(n) for the three liquid assets studied here. One clearly sees that the ‘local’ 
exponent H slowly decreases from a high value (~0.7, quite close to I/p = 2) at 
small times, to H ~ § at long times. 


10 For more details on this point, see: J.-P. Bouchaud, M. Potters, M. Meyer. Apparent multifractality in financial 
time series, European Journal of Physics. 13, 595 (2000). 
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Fig. 2.13. Hurst Function 7((n) (up to an arbitrary scaling factor) for the three liquid assets 
studied in this chapter, in log-log coordinates. The local slope gives the value of the Hurst 
exponent H, One clearly sees that this exponent goes from rather a high value for small n 
to a value close to 5 when n increases. 


2.4 Anomalous kurtosis and scale fluctuations 


For in a minute there are many days. 


(Shakespeare, Romeo and Juliet.) - ¢ 


As mentioned above, one sees in Figure 2.11 that P(éx, N) systematically deviates 
from [P,(6x,)]**. In particular, the tails of P(éx,N) are anomalously ‘fat’. 
Equivalently, the kurtosis ky of P(Sx, N) is higher than «,/N, as one can see 
from Figure 2.14, where xy is plotted as a function of N in log—log coordinates. 
Correspondingly, more complex correlations functions, such as that of the 
squares of dx,, reveal a non-trivial behaviour. An interesting quantity to consider 
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' Fig. 2.14. Kurtosis ky at scale N as a function of N, for the Bund. In this case, the 


elementary time scale tT is 30 min. If the iid assumption were true, one should find 
KN = k,/N. The straight line has a slope of —0.43, which means that the decay of 
the kurtosis «y is much smaller, as ~ 20/N°*3, 


is the amplitude of the fluctuations, averaged over one day, defined as: 


Na 


1 
= = \dxx|, (2.15) 
Y Na » k 


where Sx; is the 5-min increment, and Ny is the number of 5-min intervals within 
a day. This quantity is clearly strongly correlated in time (Figs 2.15 and 2.16): the 
periods of strong volatility persist far beyond the day time scale. 

A simple way to account for these effects is to assume that the elementary 
distribution P; also depends of time. One actually observes that the level of activity 
on a market (measured by the volume of transactions) on a given time interval 
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Fig. 2.15. Evolution of the ‘volatility’ y as a function of time, for the S&P 500 in the 
period 1991-95. One clearly sees periods of large volatility, which persist in time. 


can vary quite strongly with time. It is reasonable to think that the scale of the 
fluctuations y of the price depends directly on the frequency and volume of the 
transactions. A simple hypothesis is that apart from a change of this level of 
activity, the mechanisms leading to a change of price are the same, and therefore 
that the fluctuations have the same distributions, up to a change of scale. More 
precisely, we shall assume that the distribution of price changes is such that: 


] 5x, 
P(x) = — Pio (=) (2.16) 
Vk Vk 


where Pjo(w) is a certain distribution normalized to 1 and independent of k. The 
factor , represents the scale of the fluctuations: one can define y, and Pg such 
that f |u| Pig(u) du = 1. The variance D,t is then proportional to y7. 

In the case where P) is Gaussian and in the limit of continuous time, the model 
defined by Eq. (2.16) is known in the literature as a ‘stochastic volatility’ model. 
The model defined by Eq. (2.16) is however more general since Pig is a priori 
arbitrary. 

If one assumes that the random variables 5x;,/7, are independent and of zero 
mean, one can show (see Section 1.7.2 and Appendix A) that the average kurtosis 
of the distribution P(éx, N) is given, for N > 1, by: 


| x é 
Ky = a E + (3 + «o)g(0) + y(t = = «| ' (2.17) 


_ 
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Fig. 2.16. Temporal correlation function, (yzye+e) — (y¢)*, normalized by (v2) — (in): 
The value £ = | corresponds to an interval of 1 day. For comparison, we have shown a 
decay as 1//2. 


where xo is the kurtosis of the distribution Pj9 defined above (cf. Eq. (2.16)), and 
g the correlation function of the variance D,t (or of the yZ): 


(D, — D)(Dy — D) = D'g(\é — kl). (2.18) 


The overbar means that one should average over the fluctuations of the D,. It is 
interesting to notice that even in the absence of ‘bare’ kurtosis (kg = 0), volatility 
fluctuations are enough to induce a non-zero kurtosis kj = Ko + (3 + Ko)g(0). 
The empirical data on the kurtosis are well accounted for using the above 
formula, with the choice g(Z) « £~", with v = 0.43 in the case of the Bund. This 
choice for g(£) is also in qualitative agreement with the decay of the correlations 


70 Statistics of real prices 


of the y’s (Fig. 2.16). However, a fit of the data using for g(é) the sum of two 
exponentials exp(— /£),2) 1s also acceptable. One finds two rather different time 
scales: ¢, is shorter than a day, and a long correlation time ¢2 of a few tens of 
days. 

One can thus quite clearly see that the scale of the fluctuations (known in the 
market as the volatility) changes with time, with a rather long persistence time 
scale. This slow evolution of the volatility in turn leads to an anomalous decay 
of the kurtosis Ky as a function of N. As we shall see in Section 4.3.4, this has 
direct consequences for the dynamics of the volatility smile observed on option 
markets. 


2.5 Volatile markets and volatility markets 


We have considered, up to now, very liquid markets, where extreme price fluctua- 
tions are rather rare. On less liquid/less mature markets, the probability of extreme 
moves is much larger. The case of short-term interest rates is also interesting, 
since the evolution of, say, the 3-month rate is directly affected by the decision 
of central banks to increase or to decrease the day to day rate. As discussed 
further in Section 2.6 below, this leads to a rather high kurtosis, related to the 
fact that the short rate often does not change at all, but sometimes changes a lot. 
The kurtosis of the US 3-month rate is on the order of 20 for daily moves (Fig. 
2.17). Emerging markets (such as South America or Eastern Europe markets) are 
obviously even wilder. The example of the Mexican peso (MXP) is interesting, 
because the cumulative distribution of the daily changes of the rate MXP/$ reveals 
power-law tails, with no obvious truncation, with an exponent yu = 1.5 (Fig. 2.18). 
This data-set corresponds to the years 1992-94, just before the crash of the peso 
(December 1994). A similar value of jz has also been observed, for example, in the 
fluctuations of the Budapest Stock Exchange." 

Another interesting quantity is the volatility itself which varies with time, as 
emphasized above. The price of options reflect quite accurately the value of the 
historical volatility in a recent past (see Section 4.3.4). Therefore, the volatility 


can be considered as a special type of asset, which one can study as such. We 


shall define as above the volatility y, as the average over a day of the absolute 
value of the 5-min price increments. The autocorrelation function of the y’s is 


shown in Figure 2.16; it is found to decrease slowly, perhaps as a power-law . 


with an exponent v in the range 0.1 to 0.5 (Fig. 2.16).!? The distribution of the 


il J. Rotyis, G. Vattay, Statistical analysis of the stock index of the Budapest Stock Exchange, in [Kondor and 
Kertecz]. 

12 On this point. see, e.g. [Ding. Ameodo], and Y. Liu e7 al.. The statistical properties of volatility of price 
fluctuations, Physical Review, E 60. 1390 (1999), 
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Fig. 2.17. Cumulative distribution P}.. (6x) (for 5x > 0) and P, (6x) (for dx < 0), for 


the US 3-month rate (US T-Bills from 1987 to 1996), with r = 1 day. The thick line 
corresponds to the best fit using a symmetric TLD Ee of index 2 = 3. We have also 


shown the corresponding values of A and a7, which gives a kurtosis equal to 22.6. 


measured volatility y is shown in Figure 2.19 for the S&P 500, but other assets 
lead to similar curves. This distribution decreases slowly for large y’s, again as an 
exponential or a high power-law. Several functional forms have been suggested, 
such as a log-normal distribution, or an inverse Gamma distribution (see Section 
2.9 for a specific model for this behaviour). However, one must keep in mind that 
the quantity y is only an approximation for the ‘true’ volatility. The distribution 
shown in Figure 2.19 is therefore the convolution of the true distribution with a 
measurement error distribution. 
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Fig. 2.18. Cumulative distribution P). (dx) (for dx > 0) and P; (dx) (for 5x < 0), for 
the Mexican peso versus $, with t = 1 day. The data corresponds to the years 1992-94. 
The thick line shows a power-law decay, with a value of p = z The extrapolation to 10 
years gives a most probable worst day on the order of —40%. ~ 


2.6 Statistical analysis of the forward rate curve (*) 


« 


The case of the interest rate curve is particularly complex and interesting, since it 
is not the random motion of a point, but rather the consistent history of a whole 
curve (corresponding to different loan maturities) which is at stake. The need for 
a consistent description of the whole interest rate curve is furthermore enhanced 
by the rapid development of interest rate derivatives (options, swaps,'* options on 
swaps, etc.) [Hull]. 


A swap is a contract where one exchanges fixed interest rate payments with floating interest rate payments. 
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Fig. 2.19. Cumulative distribution of the measured volatility y of the S&P, Pj.(y), na 
linear—log plot. Note that the tail of this distribution decays more slowly than exponentially. 


Present models of the interest rate curve fall into two categories: the first 
one is the Vasicek model and its variants, which focuses on the dynamics of 
the short-term interest rate, from which the whole curve is reconstructed.'* The 
second one, initiated by Heath, Jarrow and Morton takes the full forward rate 
curve as dynamic variables, driven by (one or several) continuous-time Brownian 
motion, multiplied by a maturity-dependent scale factor. Most models are however 
primarily motivated by their mathematical tractability rather than by their ability to 
describe the data. For example, the fluctuations are often assumed to be Gaussian, 


’ thereby neglecting ‘fat tail’ effects. 


Our aim in this section is not to discuss these models in any detail, but 
rather to present an empirical study of the forward rate curve (FRC), where we 
isolate several important qualitative features that a good model should be asked 
to reproduce.'> Some intuitive arguments are proposed to relate the series of 
observations reported below. 


2.6.1 Presentation of the data and notations 


The forward interest rate curve (FRC) at time ¢ is fully specified by the collection 
of all forward rates f(t, @). for different maturities @. It allows us for example to 
calculate the price B(r, ) at time r of a (so-called ‘zero-coupon’) bond, which by 
14 Fora compilation of the most important theoretical papers on the interest rate curve, see: [Hughston). 

15 ‘This section is based on the following papers: J.-P. Bouchaud, N. Sagna, R. Cont, N. ElKaroui, M. Potters, 


Phenomenology of the interest rate curve. Applied Mathematical Finance, 6. 209 (1999) and idem, Strings 
attached, Risk Magazine, 11 (7), 56 (1998). 
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definition pays 1 at time r + 6. The forward rates are by definition such that they 
compound to give B(t, @): 


é 
Bit, @) = exp (- / Jett. au), (2.19) 
0 


r(t) = f(t, = 0) is called the ‘spot rate’. Note that in the following @ is always 
a time difference; the maturity date T ist + 6. 

Our study is based on a data-set of daily prices of Eurodollar futures contracts on 
interest rates.!° The interest rate underlying the Eurodollar futures contract is a 90- 
day rate, earned on dollars deposited in a bank outside the US by another bank. The 
interest in studying forward rates rather than yield curves is that one has a direct 
access to a ‘derivative’ (in the mathematical sense: f (¢, 0) = —d log B(t, 0)/d8), 
which obviously contains more precise information than the yield curve (defined 
from the logarithm of B(r, @)) itself. 

In practice, the futures markets price 3-months forward rates for fixed expiration 
dates, separated by 3-month intervals. Identifying 3-months futures rates with 
instantaneous forward rates, we have available a sequence of time series on forward 
rates f(t, T; —t), where 7; are fixed dates (March, June, September and December 
of each year). We can convert these into fixed maturity (multiple of 3-months) 
forward rates by a simple linear interpolation between the two nearest points such 
that 7; —¢ < 6 < 7;,,; —?t. Between 1990 and 1996, one has at least 15 different 
Eurodollar maturities for each market date. Between 1994 and 1996, the number of 
available maturities rises to 30 (as time grows, longer and longer maturity forward 
rates are being traded on future markets); we shall thus often use this restricted 
data-set. Since we only have daily data, our reference time scale will be t = | day. 
The variation of f(t.) between r and t + t will be denoted as df(t, @): 


df (t,0) = fit +7,0) — f(r, ). (2.20) 


2.6.2 Quantities of interest and data analysis 
The description of the FRC has two, possibly interrelated, aspects: 


s 


(i) What is, at a given instant of time, the shape of the FRC as a function of 
the maturity 0? 
(ii) What are the statistical properties of the increments 6f (1, 0) between time 


t and time ¢ + t, and how are they correlated with the shape of the FRC at 
time f? 


oe principle forward contracts and futures contracts are not strictly identical—they have different margin 
requirements —and one may expect slight differences, which we shall neglect in the following. 
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Fig. 2.20. The historical time series of the spot rate r(r) from 1990 to 1996 (top curve) — 
actually corresponding to a 3-month future rate (dark line) and of the ‘spread’ s(t) (bottom 
curve), defined with the longest maturity available over the whole period 1990-96 on future 
markets, i.e. @max = 4 years. 


The two basic quantities describing the FRC at time r are the value of the short- 
term interest rate f(t, Amin) (where Onin is the shortest available maturity), and that 
of the short-term/long-term spread s(t) = f(t, Omax) ~ f(t, Omin), Where Omax iS the 
longest available maturity. The two quantities r(t) ~ f(t, Amin). s(0) are plotted 
versus time in Figure 2.20;'’ note that: 


“e The volatility o of the spot rate r(r) is equal to 0.8%/,./year.'® This obtained by 


averaging over the whole period. 

e The spread s(t) has varied between 0.53 and 4.34%. Contrarily to some 
European interest rates on the same period, s(t) has always remained positive. 
(This however does not mean that the FRC is increasing monotonically, see 
below.) 


Figure 2.21 shows the average shape of the FRC, determined by averaging 


"the difference f (t,9) —r(t) over time. Interestingly, this function is rather well 


fitted by a simple square-root law. This means that on average, the difference 
between the forward rate with maturity @ and the spot rate is equal to aV/@, witha 
proportionality constant a = 0.85%/./year which turns out to be nearly identical 
to the spot rate volatility. We all propose a simple interpretation of this fact below. 
'T We shall from now on take the 3-month rate as an approximation (o the spot rate r(t). 


'8 The dimension of r should really be % per year, but we conform here to the habit of quoting r simply in %. 
Note that this can sometimes be confusing when checking the correct dimensions of a formula. 
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Fig. 2.21. The average FRC in the period 1994-96, as a function of the maturity 6. We 
have shown for comparison a one parameter fit with a square-root law, a(/@ — /@min)- 


The same ./@ behaviour actually extends up to max = 10 years. which is available in the 
second half of the time period. 


Let us now turn to an analysis of the fluctuations around the average shape. These 
fluctuations are actually similar to that of a vibrating elastic string. The average 
deviation A(@) can be defined as: 


7 


210 ([ 1.9 0-209 = }) (2.21) 


and is plotted in Figure 2.22, for the period 1994-96. The maximum of A is reached 
for a maturity of @* = 1 year. 
We now turn to the statistics of the daily increments 5f (1. 0) of the forward rates, 


by calculating their volatility 0(@) = ,/ (df(t, @)*) and their excess kurtosis 
(6f (t, 6)*) 
6) = ————_ -3. 
K(8) o4(6) G22) 


A very important quantity will turn out to be the following ‘spread’ correlation 

function: 

(Sf (t, Omin) (Sf (t, 0) — Of (t, Amin))) 
o?(Omin) 

which measures the influence of the short-term interest fluctuations on the other 


modes of motion of the FRC, subtracting away any trivial overall translation of the 
FRC. 


C(@) = (2.23) 


~] 
aad 


2.6 Statistical analysis of the FRC 


0.8 p-——-— a 


| oo, ] 
0.6 ‘ | C(6) 
: > Ae 
- A(é) j 
ae efits 
a [nS 5 
e oO = 
Oo Hoo J : 
2+ Don- 
0 ° “= OP. ; | 
O05 ~ "009, | 
0.0 bo ©90000009 | 
| | 
0.2 ! ! : : 
0 2 4 6 8 
6 (years) 


Fig. 2.22. Root mean square deviation A(@) from the average FRC as a function of #. Note 
the maximum for @* = 1 year, for which A ~ 0.38%. We have also plotted the correlation 
function C(@) (defined by Eq. (2.23)) between the daily variation of the spot rate and that of 


‘the forward rate at maturity 0. in the period 1994-96. Again, C(@) is maximum for @ = 6*, 


and decays rapidly beyond. 


Figure 2.23 shows o(@) and «(@). Somewhat surprisingly, 7 (9), much like 
A(@) has a maximum around 6* = 1 year. The order of magnitude of a(@) is 
0.05%/,/day, or 0.8%/./year. The daily kurtosis « (@) is rather high (on the order 
of 5), and only weakly decreasing with 0. 

Finally, C(@) is shown in Figure 2.22; its shape is again very similar to those of 
A(@) and o(@), with a pronounced maximum around @* = 1 year. This means that 
the fluctuations of the short-term rate are amplified for maturities around | year. 
We shall come back to this important point below. 


2.6:3 Comparison with the Vasicek model 


The simplest FRC model is a one-factor model due to Vasicek, where the whole 


- term structure can be ascribed to the short-term interest rate. The latter is assumed 


to follow a so-called ‘Ornstein—Uhlenbeck* (or mean reverting) process defined as: 
dr(t) 

dr 
where ro is an ‘equilibrium’ reference rate, §2 describes the strength of the 


reversion towards rp (and is the inverse of the mean reversion time), and &(f) is 
a Gaussian noise, of volatility 1. In its simplest version, the Vasicek model prices 


= Qiro —r(t)) + o&(t), (2.24) 
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Fig. 2.23. The daily volatility and kurtosis as a function of maturity, Note the maximum of 
the volatility for @ = 6*, while the kurtosis is rather high, and only very slowly decreasing 
with @, The two curves correspond to the periods 1990-96 and 1994-96, the latter period 
extending to longer maturities. 


a bond maturing at 7 as the following average: 


s 
BET) = ( exp— | r(u) du ) (2.25) 


where the averaging is over the possible histories of the spot rate between now and 
the maturity, where the uncertainty is modelled by the noise €. The computation of 
the above average is straightforward when & is Gaussian, and leads to (using Eq. 
(2.19)): 


2 
gm ey, (2.26) 


= o 
F@,0) =r(t) + (ro — (td — e729) ae 
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The basic results of this model are as follows: 


e Since (ro — r(t)) = 0, the average of f(t, @) — r(t) is given by 


(f(t, 0) —r(t)) = —0?/227(1 — e 8"). (2.27) 


and should thus be negative, at variance with empirical data. Note that in the 
limit 726° <1, the order of magnitude of this (negative) term is very small: 
taking 0 = 1%/,/year and @ = 1 year, it is found to be equal to 0.005%, much 
smaller than the typical differences actually observed on forward rates. 

e The volatility o(@) is monotonically decreasing as exp —$26, while the kurtosis 
« (@) is identically zero (because & is Gaussian). 

e The correlation function C(@) is negative and is a monotonic decreasing function 
of its argument, in total disagreement with observations (Fig. 2.22). 

e The variation of the spread s(t) and of the spot rate should be perfectly 
correlated, which is not the case (Fig. 2:22): more than one factor is in any case 
needed to account for the deformation of the FRC. 


An interesting extension of Vasicek’s model designed to fit exactly the ‘initial’ FRC 
f(t = 0,8) was proposed by Hull and White [Hull]. It amounts to replacing the 
above constants Q and ro by time-dependent functions. For example, ro(t) represents the 
anticipated evolution of the ‘reference’ short-term rate itself with time. These functions 
can be adjusted to fit f (t = 0,0) exactly. Interestingly, one can then derive the following 


relation: 
or(t) ae of 
7 = (%u.0), (2.28) 


up to a term of order oa” which turns out to be negligible, exactly for the same reason 
as explained above. On average, the second term (estimated by taking a finite difference 
estimate of the partial derivative using the first two points of the FRC) is definitely found 
to be positive, and equal to 0.8%/year. On the same period (1990-96), however, the spot 
rate has decreased from 8.1 to 5.9%, instead of growing by 7 x 0.8% = 5.6%. 


In simple terms, both the Vasicek and the Hull—White model mean the following: 
the FRC should basically reflect the market’s expectation of the average evolution 
of the spot rate (up to a correction of the order of o7, but which turns out to be 
very small, see above). However, since the FRC is on average increasing with the 
maturity (situations when the FRC is ‘inverted’ are comparatively much rarer), 
this would mean that the market systematically expects the spot rate to rise, which 
it does not. It is hard to believe that the market persists in error for such a long 
time. Hence, the upward slope of the FRC is not only related to what the market 
expects on average, but that a systematic risk premium is needed to account for this 
increase. 
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2.6.4 Risk-premium and the \/8 law 
The average FRC and value-at-risk pricing 

The observation that on average the FRC follows a simple V@ law (ie. (f (1, @) — 
r(t)) oc /@) suggests an intuitive, direct interpretation. At any time r, the market 
anticipates either a future rise, or a decrease of the spot rate. However, the 
average anticipated trend is, in the long run, zero, since the spot rate has bounded 
fluctuations. Hence, the average market’s expectation is that the future spot rate 
r(t) will be close to its present value r(t = 0). In this sense, the average FRC 
should thus be flat. However, even in the absence of any trend in the spot rate, its 
probable change between now and t = @ is (assuming the simplest random walk 
behaviour) of the order of o /@, where o is the volatility of the spot rate. Money 
lenders agree at time ¢ on a loan at rate f(t, @), which will run between time ¢ + @ 
and t + 6 + d@. These money lenders will themselves borrow money from‘central 
banks at the short-term rate prevailing at that date, i.e. r(t +6). They will therefore 
lose money whenever r(f + 6) > f(t, @). Hence, money lenders take a bet on 
the future value of the spot rate and want to be sure not to lose their bet more 
frequently than, say, once out of five. Thus their price for the forward rate is such 
that the probability that the spot rate at time f+, r(t +6) actually exceeds f(t, 6) 
is equal to a certain number p: 


oo 
/ P(r',t +0|r, t)dr’ = p, (2.29) 
f(,8) 

where P(r’, r'|r, 1) is the probability that the spot rate is equal to r’ at time fr’ 
knowing that it is r now (at time rf). Assuming that r’ follows a simple random 
walk centred around r(r) then leads to:!9 


f(t.0) =r(t) +aa(0)V0, a= V2erfe!(2p). (2.30) 


which indeed matches the empirical data, with p x 0.16. 

Hence, the shape of today’s FRC can be thought of as an envelope for the 
probable future evolutions of the spot rate. The market appears to price future rates 
through a Value ar Risk procedure (Eqs. (2.29) and (2.30) —see Chapter 3 below) 
rather than through an averaging procedure. 


The anticipated trend and the volatility hump 


Let us now discuss, along the same lines, the shape of the FRC at a given instant of time, 
which of course deviates from the average square root law. For a given instant of time t, the 
market actually expects the spot rate to perform a biased random walk. We shall argue that 
a consistent interpretation is that the market estimates the trend m(t) by extrapolating the 


19 This assumption is certainly inadequate for small times, where large kurtosis effects are present. However, on 
the scale of months, these non-Gaussian effects can be considered as small. 
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n 


past behaviour of the spot rate itself, Hence, the probability distribution P(r'.t + Ort) 


- used by the marker is not centred around v(t) but rather around: 


re 
r(t) +/ m(t.t+ujdu. (2.31) 
i 


where m(1.t’) can be called the anticipated bias at time t', seen from time t. 
It is reasonable to think that the market estimates m by extrapolating the recent past to 
the nearby future. Mathematically, this reads: 


Le <] 
m(t,t +u) = m\(t)Z(u) where mi (t) = [ K (v)ér(t — v) du, (2.32) 
0 


and where K (v) is an averaging kernel of the past variations of the spot rate. One may call 
Z(u) the trend persistence function; it is normalized such that Z(0) = 1, and describes 
how the present trend is expected to persist in the future. Equation (2.29) then gives: 


6 
f(t.9) =r(t) + Aovd + min | Z(u)du. (2.33) 
0 


This mechanism is a possible explanation of why the three functions introduced above, 
namely A(@), o(@) and the correlation function C(8) have similar shapes. Indeed, taking 
for simplicity an exponential averaging kernel K (v) of the form € exp[—e€v}, one finds: 

dmj(t) oe ro 

2 = oe 

where &(t) is an independent noise of strength op, added to introduce some extra noise in 

the determination of the anticipated bias. In the absence of temporal correlations, one can 
compute from the above equation the average value of mi. It is given by: 


+ €&(t), (2.34) 


(m{) = 5 (0 (0) + of). (2.35) 


In the simple model defined by Eq. (2.33) above, one finds that the correlation function 
C(@) is given by: 


6 
co =e f Z(u) du. (2.36) 
0 


Using the above result for (m3), one also finds: 


a7(0) + op 
A(@) = —3. (2.37) 


thus showing that A(@) and C(@) are in this model simply proportional. 
Turning now to the volatility o (8), one finds that it is given by: 


o7(@) = [1 + C(@)P 070) + C6). (2.38) 


We thus see that the maximum of a(@) is indeed related to that of C(@). Intuitively, the 
reason for the volatility maximum is as follows: a variation in the spot rate changes that 


20 In reality, one should also take into account the fact that ao (0) can vary with time. This brings an extra 
contribution both to C(#) and to a{@). 
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Fig. 2.24. Comparison between the theoretical prediction and the observed daily volatility 
of the forward rate at maturity @, in the period 1994-96. The dotted line corresponds to Eq. 
(2.38) with og = o(0), and the full line is obtained by adding the effect of the variation of 
the coefficient aa (0) in Eq. (2.33), which adds a contribution proportional to 0. 


market anticipation for the trend m,(t), But this change of trend obviously has a larger 
effect when multiplied by a longer maturity, For maturities beyond I year, however, the 
decay of the persistence function comes into play and the volatility decreases again. The 
relation Eq. (2.38) is tested against real data in Figure 2.24. An important prediction of 
the model is that the deformation of the FRC should be strongly correlated with the past 
trend of the spot rate, averaged over a time scale 1/¢€ (see Eq. (2.34)). This correlation has 
been convincingly established recently, with 1/e ~ 100 days.”! 


2.7 Correlation matrices (*) 


As we shall see in Chapter 3, an important aspect of risk management is the 
estimation of the correlations between the price moyements of different assets. 
The probability of large losses for a certain portfolio or option book is dominated 
by correlated moves of its different constituents —for example, a position which is 
simultaneously long in stocks and short in bonds will be risky because stocks and_ 
bonds move in opposite directions in crisis periods. The study of correlation (or 
covariance) matrices thus has a long history in finance, and is one of the cornerstone 
of Markowitz’s theory of optimal portfolios (see Section 3.3). However, a reliable 
empirical determination of a correlation matrix turns out to be difficult: if one 
considers M assets, the correlation matrix contains M(M — 1)/2 entries, which 
must be determined from M time series of length N; if N is not very large 


21 See: A. Matacz, J.-P. Bouchaud, An empirical study of the interest rate curve. to appear in International 
Journal of Theoretical and Applied Finance (2000). 
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compared to M, one should expect that the determination of the covariances 
is noisy, and therefore that the empirical correlation matrix is to a large extent 
random, i.e. the structure of the matrix is dominated by ‘measurement’ noise. If 
this is the case, one should be very careful when using this correlation matrix in 
applications. From this point of view, it is interesting to compare the properties of 
an empirical correlation matrix C to a ‘null hypothesis’ purely random matrix as 
one could obtain from a finite time series of strictly uncorrelated assets. Deviations 
from the random matrix case might then suggest the presence of true information.” 

The empirical correlation matrix C is constructed from the time series of price 

changes 5x}, (where i labels the asset and & the time) through the equation: 
ij 
Ci; => N 5x, 5x;. (2.39) 
k=l 

In the following we assume that the average value of the 5x’s has been subtracted 
off, and that the 6x’s are rescaled to have a constant unit volatility. The null 
hypothesis of independent assets, which we consider now, translates itself in 
the assumption that the coefficients 5x; are independent, identically distributed, 
random variables.*> The theory of random matrices, briefly expounded in Section 
1.8, allows one to compute the density of eigenvalues of C, pc (A), in the limit of 
very large matrices: it is given by Eq. (1.120), with Q = N/M. 

Now, we want to compare the empirical distribution of the eigenvalues of the 
correlation matrix of stocks corresponding to different markets with the theoretical 
prediction given by Eq. (1.120), based on the assumption that the correlation 
matrix is random. We have studied numerically the density of eigenvalues of the 
correlation matrix of M = 406 assets of the S&P 500, based on daily variations 
during the years 1991-96, for a total of N = 1309 days (the corresponding 
value of Q is 3.22). An immediate observation is that the highest eigenvalue A, 
is 25 times larger than the predicted Amax (Fig. 2.25, inset). The corresponding 
eigenvector is, as expected, the ‘market’ itself, i.e. it has roughly equal components 
on all the M stocks. The simplest ‘pure noise’ hypothesis is therefore inconsistent 
with the value of 4;. A more reasonable idea is that the components of the 
correlation matrix which are orthogonal to the ‘market’ is pure noise. This amounts 
to subtracting the contribution of Amax from the nominal value o* = I, leading to 
o* = 1 —dAmax/M = 0.85. The corresponding fit of the empirical distribution ts 
shown as a dotted line in Figure 2.25. Several eigenvalues are still above Ajax and 
might contain some information, thereby reducing the variance of the effectively 


22 This section is based on the fol lowing paper: L. Laloux, P. Cizeau, J.-P. Bouchaud, M. Potters, Random matrix 
theory, RISK Magazine. 12, 69 (March 1999). 


= * > . . . . . . . s+ . . 
23 Note that even if the ‘true’ correlation matrix Crue is the identity matrix, its empirical determination from a 
finite time series will generate non-trivial eigenvectors and eigenvalues. 
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Fig. 2.25. Smoothed density of the eigenvalues of C, where the correlation matrix C is 
extracted from M = 406 assets of the S&P 500 during the years 1991-96. For comparison 
we have plotted the density Eq. (1.120) for Q = 3.22 and o? = 0.85: this is the theoretical 
value obtained assuming that the matrix is purely random except for its highest eigenvalue 
(dotted line). A better fit can be obtained with a smaller value of 2 = 0.74 (solid line). 
corresponding to 74% of the total variance. Inset: same plot, but including the highest 
eigenvalue corresponding to the ‘market’, which is found to be ~ 30 times greater than 


Amax: 


random part of the correlation matrix. One can therefore treat 0” as an adjustable 
parameter. The best fit is obtained for o* = 0.74, and corresponds to the dark 
line in Figure 2.25. which accounts quite satisfactorily for 94% of the spectrum. 
whereas the 6% highest eigenvalues still exceed the theoretical upper edge by a 
substantial amount. These 6% highest eigenvalues are however responsible for 26% 
of the total volatility. 

One can repeat the above analysis on different stock markets (e.g. Paris, London, 
Zurich), or on volatility correlation matrices, to find very similar results. In a first 
approximation, the location of the theoretical edge, determined by fitting the part 
of the density which contains most of the eigenvalues, allows one to distinguish 
‘information’ from ‘noise’. 
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The conclusion of this section is therefore that a large part of the empirical 
correlation matrices must be considered as ‘noise’, and cannot be trusted for risk 
management. In the next chapter, we will dwell on Markowitz’ portfolio theory, 
which assumes that the correlation matrix is perfectly known. This theory must 
therefore be taken with a grain of salt, bearing in mind the results of the present 
section, 


2.8 A simple mechanism for anomalous price statistics (*) 


We have chosen the family of TLD to represent the distribution of price fluctua- 
tions. As mentioned above, Student distributions can also account quite well for 
the shape of the empirical distributions. Hyperbolic distributions have also been 
proposed. The choice of TLDs was motivated by two particular arguments: 


¢ This family of distributions generalizes in a natural way the two classical 
descriptions of price fluctuations, since the Gaussian corresponds to uz = 2, 
and the stable Lévy distributions correspond to a = 0. 


_ @ The idea of TLD allows one to account for the deformation of the distributions 


as the time horizon N increases, and the anomalously high value of the Hurst 
exponent H at small times, crossing over to H = 3 for longer times. 


However, in order to justify the choice of one family of laws over the others, one 
needs a microscopic model for price fluctuations where a theoretical distribution 
can be computed. In the next two sections, we propose such ‘models’ (in the 
physicist’s sense). These models are not very realistic, but only aim at showing 


_ that power-law distributions (possibly with an exponential truncation) appear quite 


naturally. Furthermore, the model considered in this section leads to a value of 
jt = 3, close to the one observed on real data.24 

We assume that the price increment 5x, reflects the instantaneous offset between 
supply and demand. More precisely, if each operator on the market @ wants to buy 
or sell a certain fixed quantity g of the asset X, one has: 


bx, OG >. Pus (2.40) 


where y, can take the values — 1, 0 or + 1, depending on whether the operator 
is selling, inactive, or buying. Suppose now that the operators interact among 
themselves in an heterogeneous manner: with a small probability p/A’ (where 


NV is the total number of operators on the market), two operators @ and f are 


24 This model was proposed i in R. Cont, J.-P. Bouchaud, Herd behavior and aggregate fluctuations in financial 
markets, to appear in Journal of Macroeconomic Dynamics (1999). See also: D. Stauffer. P. M. C. de Olivera, 
A. T. Bernardes. Monte Carlo simulation of volatility clustering in market model with herding, /nternational 
Journal of Theoretical and Applied Finance 2, 83 (1999). 
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‘connected’, and with probability | — p/A‘, they ignore each other. The factor 
1/’ means that on average, the number of operators connected to any particular 
one is equal to p. Suppose finally that if two operators are connected, they come to 
agree on the strategy they should follow, i.e. @, = gp. 

It is easy to understand that the population of operators Clusters into groups 
sharing the same opinion. These clusters are defined such that there exists a 
connection between any two operators belonging to this cluster, although the 
connection can be indirect and follow a certain ‘path’ between operators. These 
clusters do not all have the same size, i.e. do not contain the same number of 
operators. If the size of cluster A is called N (A), one can write: 


bx gD) N(A)G(A), (2.41) 
A 


where (A) is the common opinion of all operators belonging to A. The statistics 
of the price increments 5x, therefore reduces to the statistics of the size of clusters, 
a classical problem in percolation theory [Stauffer]. One finds that as long as p < 1 
(less than one ‘neighbour’ on average with whom one can exchange information), 
then all N(A)’s are small compared with the total number of traders A’. More 
precisely, the distribution of cluster sizes takes the following form in the limit 
where] —p=e<l: 


l 
P(N) ws OE exp(-e7N) NC&KN. (2.42) 


When p = | (percolation threshold), the distribution becomes a pure power-law 
with an exponent 1 + yu = 3, and the CLT tells us that in this case, the distribution 
of the price increments 5x is precisely a pure symmetric Lévy distribution of index 
tom ; (assuming that g = + 1 play identical roles, that is if there is no global bias 
pushing the price up or down). If p < 1, on the other hand, one finds that the Lévy 
distribution is truncated exponentially far in the tail. If p > 1, a finite fraction of 
the NV’ traders have the same opinion: this leads to a crash. 

This simple model is interesting but has one major drawback: one has to assume 
that the parameter p is smaller than one, but relatively close to one such that 
Eq. (2.42) is valid, and non-trivial statistics follows. One should thus explain 
why the value of p spontaneously stabilizes in the neighbourhood of the cfitical 
value p = |. Certain models do actually have this property, of being close to 
or at a Critical point without having to fine tune any of their parameters. These 
models are called ‘self-organized critical’ [Bak e7 al.]. In this spirit, let us mention 
a very recent model of Sethna er al. [Dahmen and Sethna], meant to describe 
the behaviour of magnets in a time dependent magnetic field. Transposed to the 
present problem, this model describes the collective behaviour of a set of traders 
exchanging information, but having all different a priori opinions. One trader can 
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however change his mind and take the opinion of his neighbours if the coupling is 
strong, or if the strength of his a priori opinion is weak. All these traders feel an 
external ‘field’, which represents for example a long-term expectation of economy 
growth or recession, leading to an increased average pessimism or optimism. For a 
large range of parameters, one finds that the buy orders (or the sell orders) organize 
as avalanches of various sizes, distributed as a power-law with an exponential 
cut-off, with «= ; = 1.25. If the anticipation of the traders are too similar, or 
if the coupling between agents is too strong (strong mimetism). the model again 
leads to a crash-like behaviour. 


2.9 A simple model with volatility correlations and tails (*) 


In this section, we show that a very simple feedback model where past high values 
of the volatility influence the present market activity does lead to tails in the 
probability distribution and, by construction, to volatility correlations. The present 
model is close in spirit to the ARCH models which have been much discussed in 
this context. The idea is to write: 


Xett = Xe + OnE, (2.43) 


where & is a random variable of unit variance, and to postulate that the present day 
volatility o, depends on how the market feels the past market volatility. If the past 
price variations happened to be high, the market interprets this as a reason to be 
more nervous and increases its activity, thereby increasing o,. One could therefore 
consider, as a toy-model:° 


_F%K+1 — 9 = (1 — €)(OK — 9) + AE long, (2.44) 


which means that the market takes as an indicator of the past day activity the 
absolute value of the close to close price difference x44; — x,. Now, writing 


loxgi| = (lo§|) + 0, (2.45) 


and going to a continuous-time formulation, one finds that the volatility probability 


distribution P (a. t) obeys the following ‘Fokker—Planck’ equation: 
aP(o.t) a(a — Go) Plo, t) > »0°0*P(a, t) 

= 222 5 

ry € na + ce aa (2.46) 


whete Gp = oo — A€(|o&|), and where c? is the variance of the noise A€. The 
equilibrium solution of this equation, P.(o), is obtained by setting the left-hand 


? . * - . . . . . 
25 In the simplest ARCH model, the following equation is rather written in terms of the variance, and second 
term of the right-hand side is taken to be equal to: €(a,£;)7. 
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side to zero. One finds: 


exp(—do/o) 
oltn 


Po) = (2.47) 
with 4 = 1 + (c?e)~! > 1. Now, for a large class of distributions for the random 
noise &, for example Gaussian, it is easy to show, using a saddle-point calculation, 
that the tails of the distribution of 6x are power-laws, with the same exponent ji. 
Interestingly, a short-memory market, corresponding to « ~ 1, has much wilder 
tails than a Jong-memory market: in the limit « + 0, one indeed has pp — oo. In 
other words, over-reactions is a potential cause for power-law tails. 


2.10 Conclusion 


The above statistical analysis reveals very important differences between the 
simple model usually adopted to describe price fluctuations, namely the geometric 
(continuous-time) Brownian motion and the rather involved statistics of real price 
changes. The geometric Brownian motion description is at the heart of most 
theoretical work in mathematical finance, and can be summarized as follows: 


e One assumes that the relative returns (rather than the absolute price increments) 
are independent random variables. 

e One assumes that the elementary time scale t tends to zero; in other words that 
the price process is a continuous-time process. It is clear that in this limit, the 
number of independent price changes in an interval of time T is N = T/t > ov. 
One is thus in the limit where the CLT applies whatever the time scale T. 


If the variance of the returns is finite, then according to the CLT, the only possibility 
is that price changes obey a log-normal distribution. The process is also scale 
invariant, that is that its statistical properties do not depend on the chosen time 
scale (up to a multiplicative factor —see Section 1.5.3).?¢ 

The main difference between this model and real data is not only that the tails of 
the distributions are very poorly described by a Gaussian law, but also that several 
important time scales appear in the analysis of price changes: 


e A ‘microscopic’ time scale t below which price changes are correlated. This 
time scale is of the order of several minutes even on very liquid markets. 

e A time scale T* = N*zt, which corresponds to the time where non-Gaussian 
effects begin to smear out, beyond which the CLT begins to operate. This 
time scale T7* depends much on the initial kurtosis on scale t. As a first 

26 This scale invariance is more general than the Gaussian model discussed here, and is the basic assumption 


underlying all ‘fractal’ descriptions of financial markets. These descriptions fail to capture the existence of 
several important time scales that we discuss here. 
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approximation, one has: T* = «,t, which is equal to several days, even on 
very liquid markets. 

e A time scale corresponding to the correlation time of the volatility fluctuations, 
which is of the order of 10 days to a month or even longer. 

e And finally a time scale T, governing the crossover from an additive model, 
where absolute price changes are the relevant random variables, to a multiplica- 
tive model, where relative returns become relevant. This time scale is also of the 
order of months. 


It is clear that the existence of all these time scales is extremely important to 
take into account in a faithful representation of price changes, and play a crucial 
role both in the pricing of derivative products, and in risk control. Different assets 
differ in the value of their kurtosis, and in the value of these different time scales. 
For this reason, a description where the volatility is the only parameter (as is the 
case for Gaussian models) are bound to miss a great deal of the reality. 
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Extreme risks and optimal portfolios 


Il n’est plus grande folie que de placer son salut dans l'incertitude.' 


(Madame de Sévigné, Lettres.) 


3.1 Risk measurement and diversification 


Measuring and controlling risks is now one of the major concern across all modern 
human activities. The financial markets, which act as highly sensitive economical 
and political thermometers, are no exception. One of their réles is actually to allow 
the different actors in the economic world to trade their risks, to which a price must 
therefore be given. 

The very essence of the financial markets is to fix thousands of prices all 
day long, thereby generating enormous quantities of data that can be analysed 
Statistically. An objective measure of risk therefore appears to be easier to achieve 
in finance than in most other human activities, where the definition of risk is vaguer, 
and the available data often very poor. Even if a purely statistical approach to 
financial risks is itself a dangerous scientists’ dream (see e.g. Fig. 1.1), it is fair to 
say that this approach has not been fully exploited until the very recent years, and 
that many improvements can be expected in the future, in particular concerning the 
control of extreme risks. The aim of this chapter is to introduce some classical ideas 
on financial risks, to illustrate their weaknesses, and to propose several theoretical 
ideas devised to handle more adequately the ‘rare events’ where the true financial 
risk resides. 


3.1.1 Risk and volatility 


Financial risk has been traditionally associated with the statistical uncertainty on 
the final outcome. Its traditional measure is the RMS, or, in financial terms, the 


1 Nothing is more foolish than betting on uncertainty for salvation. 
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volatility. We will note by R(T) the logarithmic return on the time interval T. 
defined by: 


T 
R(T) = log E |. GB.) 
XO 


where x(7) is the price of the asset X at ime 7, knowing that it is equal to xo 
today (¢ = 0). When |x(7) — xo| < xg, this definition is sailiacind to R(T) = 
x(T)f/x9 — 1. 
If P(x. T{xq, 0) dx is the conditional probability of finding x(T) = x within dx, 
the volatility o of the investment is the standard deviation of R(T), defined by: 


2 
c= B i P(x, T|xo, 0)R2(T) dx — (/ P(x, T x0, R(T) ax. |: (3.2) 


The volatility is in general chosen as an adequate measure of risk associated to a 
given investment. We notice however that this definition includes in a symmetrical 
way both abnormal gains and abnormal losses. This fact is a priori curious. The 
theoretical foundations behind this particular definition of risk are numerous: 


e First, operational; the computations involving the variance are relatively simple 
and can be generalized easily to multi-asset portfolios. 

e Second, the Central Limit Theorem (CLT) presented in Chapter 1 seems to 
provide a general and solid justification: by decomposing the motion from xo 
to.x(T) in N = T/t increments, one can write: 


N-1 
x(T) =xo+ Cz bx, with dxp = xpN,. (3.3) 
k=0 
where x, = x(t = kt) and 7; is by definition the instantaneous return. 
Therefore, we have: 
Bs (i = 
R(T) = | “ 4 = Yo logit + Tk). (3.4) 
“0 k=0 


In the classical approach one assumes that the returns 7, are independent variables. 
From the CLT we learn that in the limit where N > oo, R(T) becomes a Gaussian 
random variable centred on a given average retum mT, with m = (log(1 + ))/T. 
and whose standard deviation is given by o\/T.? Therefore, in this limit, the entire 
probability distribution of R(T) is parameterized by two quantities only, m and o: 
any reasonable measure of risk must therefore be based on a. 


2 To second order in ny < 1, we find: o2 = 2 (n?) and mi = 1 (yn) — 407. 
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However, as we have discussed at length in Chapter 1, this is not true for finite 
N (which corresponds to the financial reality: there are only roughly N ~ 320 
half-hour intervals in a working month), especially in the ‘tails’ of the distribution. 
corresponding precisely to the extreme risks. We wil] discuss this point in detail 
below. 

One can give to o the following intuitive meaning: after a long enough time T, 


‘th® price ef asset X is given by: 


x(T) = xo exp[mT +oVTE], (3.5) 


where € is a Gaussian random variable with zero mean and unit variance. The 
quantity o /T gives us the order of magnitude of the deviation from the expected 
return. By comparing the two terms in the exponential, one finds that when T >> 
i= o*/m*, the expected return becomes more important than the fluctuations, 
which means that the probability that x(7) is smaller than x9 (and that the actual 
rate of return over that period is negative) becomes small. The ‘security horizon’ 
7 increases with o. For a typical individual stock, one has m = 10% per year and 
o = 20% per year, which leads to a 7 as long as 4 years! 

The quality of an investment is often measured by its ‘Sharpe ratio’ S, that is, 


. the ‘signal-to-noise’ ratio of the mean return mT to the fluctuations a ./T 33 


a (3.6) 


The Sharpe t ratio increases with the investment horizon and is equal to 1 precisely 
when T = T. Practitioners usually define the Sharpe ratio for a 1-year horizon, 

Note that the most probable value of x(7), as given by Eq. (3.5), is equal 
to xy exp(mT), whereas the mean value of x(T) is higher: assuming that & is 
Gaussian, one finds: xo exp[(m7 + 077 /2)]. This difference is due to the fact that 
the returns nz, rather than the absolute increments Sx,, are iid random variables. 
However, if 7 is short (say up to a few months), the difference between the two 
descriptions is hard to detect. As explained in Section 2.2.1, a purely additive 
description is actually more adequate at short times. In other words, we shall often 
in the following write x(T) as: 


x(T) = xoexp[(mT +o0VTE) = xo + mT +VDTE. (3.7) 


where we have introduced the following notations: mm = mxg, D = ox? which 


we shall use throughout the following. The non-Gaussian nature of the random 
variable € is therefore the most important factor determining the probability for 
extreme risks. 


3 {tis customary to subtract from the mean return sii the risk-free rate in the definition of the Sharpe ratio. 
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3.1.2 Risk of loss and ‘Value at Risk’ (VaR) 


The fact that financial risks are often described using the volatility is actually 
inumately related to the idea that the distribution of price changes is Gaussian. 
In the extreme case of ‘Lévy fluctuations’, for which the variance is infinite, this 
definition of risk would obviously be meaningless. Even in a less risky world, this 
measure of risk has three major drawbacks: 


e The financial risk is obviously associated to losses and not to profits. A definition 
of risk where both events play symmetrical roles is thus not in conformity with 
the intuitive notion of risk, as perceived by professionals. 
As discussed at length in Chapter 1, a Gaussian model for the price fluctuations 
is never justified for the extreme events, since the CLT only applies in the centre 
of the distributions. Now, it is precisely these extreme risks that are of most 
concern for all financial houses, and thus those which need to be controlled in 
priority. In recent years, international regulators have tried to impose some rules 
to limit the exposure of banks to these extreme risks. 

e The presence of extreme events in the financial time series can actually lead to a 
very bad empirical determination of the variance: its value can be substantially 
changed by a few ‘big days’. A bold solution to this problem is simply to remove 
the contribution of these so-called aberrant events! This rather absurd solution is 
actually quite commonly used. ° 


Both from a fundamental point of view, and for a better control of financial risks, 
another definition of risk is thus needed. An interesting notion that we shall develop 
now is the probability of extreme losses, or, equivalently, the ‘value-at-risk’ (VaR). 
The probability to lose an amount —dx larger than a certain threshold A on a 
given time horizon t is defined as: 
-—A 
Piéx < —A] = P-[-A] = | P,(5x) dbx, (3.8) 
—o 
where P, (dx) is the probability density for a price change on the time scale t. One 
can alternatively define the risk as a level of loss (the “VaR’) Ayag corresponding 
to a certain probability of loss Pyar over the time interval t (for example, Par = 
1%): 


—Avar 
/ P, (5x) dbx = Pyar. (3.9) 
~00 ; 
This definition means that a loss greater than Ayap Over a time interval of t = 1 day 
(for example) happens only every 100 days on average for Pyag = 1%. Let us note 
that this definition does not take into account the fact that losses can accumulate on 
consecutive time intervals t, leading to an overall loss which might substantially 
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P(A, N) 


Fig. 3.1. Extreme value distribution (the so-called Gumbel distribution) P(A, N) when 
P,(Sx) decreases faster than any power-law. The most probable value, Amax, has a 
probability equal to 0.63 to be exceeded. 


exceed Ay,r. Similarly, this definition does not take into account the value of the 
maximal loss ‘inside’ the period t. In other words, only the closing price over the 
period [kr, (k + 1)t] is considered, and not the lowest point reached during this 
time interval: we shall come back on these temporal aspects of risk in Section 3.1.3. 

More precisely, one can discuss the probability distribution P(A, N) for the 
worst daily loss A (we choose t = 1 day to be specific) on a temporal horizon 
Tyan = Nt =-t/Pyag. Using the results of Section 1.4, one has: 


P(A, N) = N(P,(—A)]""' P,(—A). (3.10) 


For N large, this distribution takes a universal shape that only depends on the 
asymptotic behaviour of P,(é6x) for dx —> ~—oo. In the important case for 
practical applications where P,(8x) decays faster than any power-law, one finds 
that P(A, N) is given by Eq. (1.40), which is represented in Figure 3.1. This 
distribution reaches a maximum precisely for A = Avar, defined by Eq. (3.9). The 
intuitive meaning of Ayar is thus the value of the most probable worst day over a 
time interval Tyr. Note that the probability for A to be even worse (A > Ayar) iS 
equal to 63% (Fig. 3.1). One could define Ayar in such a way that this probability 
is smaller, by requiring a higher confidence level, for example 95%. This would 
mean that on a given horizon (for example 100 days), the probability that the worst 
day is found to be beyond Ayap is equal to 5%. This Ayar then corresponds to the 
most probable worst day on a time period equal to 100/0.05 = 2000 days. 
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Fig. 3.2. Growth of Amax as a function of the number of days N, for an asset of 
daily volatility equal to 1%, with different distribution of price increments: Gaussian, 
(symmetric) exponential, or power-law with x = 3 (cf. Eq. (1.83)). Note that for 
intermediate N, the exponential distribution leads to a larger VaR than the power-law; 
this relation is however inverted for N — oo. 


In the Gaussian case, the VaR is directly related to the volatility 7. Indeed, for a 
Gaussian distribution of RMS equal to o4x9 = ox9,/T, and of mean m,, one finds 
that Ayapr is given by: 


Pee (-Set*) = Pyar > Avan = V201x9erfe'[2Pyap] — m1, 3.11) 
(cf. Eq. (1.68)). When m, is small, minimizing A yap is thus equivalent to minimiz- 
ing a. Itis furthermore important to note the very slow growth of Ayar as a function 
of the time horizon in a Gaussian framework. For example, for Tyarp = 250t (1 
market year), corresponding to Pyar = 0.004, one finds that Ayar ~ 2.650; x9. 
Typically, fort = 1 day, o, = 1%, and therefore, the most probable worst day 
on a market year is equal to —2.65%, and grows only to —3.35% over a 1Ozyear 
horizon! 

In the general case, however, there is no useful link between o and Ayag. In 
some Cases, decreasing one actually increases the other (see below, Section 3.2.3 
and 3.4). Let us nevertheless mention the Chebyshev inequality, often invoked by 
the volatility fans, which states that if o exists, 


as Par’ (3.12) 
a 
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Table 3.1. J.P. Morgan international bond indices (expressed in French Francs), 
analysed over the period 1989-93, and worst day observed day in 1994. The 
predicted numbers correspond to the most probable worst day Amax- The amplitude 
of the worst day with 95% confidence level is easily obtained. in the case of 
exponential tails, by multiplying Amax by a factor 1.53. The last line corresponds 
to a portfolio made of the 11 bonds with equal weight. All numbers are in per cent 


Country Worst day Worstday Worst day 

Log-normal TLD Observed 
Belgium 0.92 1.14 Ll 
Canada 2.07 2.78 2.76 
Denmark 0,92 1,08 1.64 
France 0.59 0.74 1.24 
Germany 0.60 0.79 1.44 
Great Britain 1.59 2.08 2.08 
Italy 1.31 2.60 4.18 
» Japan 0.65 0.82 1.08 
Netherlands 0.57 0.70 1.10 
Spain 1.22 1.72 1.98 
United States 1.85 2.31 2.26 
Portfolio 0.61 0.80 ee) 


This inequality suggests that in general, the knowledge of o is tantamount to that 
of Ayar. This is however completely wrong, as illustrated in Figure 3.2. We have 
represented Ajax (= Avar with Pyar = 0.63) as a function of the time horizon 
Tyan = Nt for three distributions P..(6x) which all have exactly the same variance, 
but decay as a Gaussian, as an exponential, or as a power-law with an exponent 
ye = 3 (ef. Eq. (1.83)). Of course, the slower the decay of the distribution, the 
faster the growth of Avar when Tygr > 00. 

Table 3.1 shows, in the casé of international bond indices, a comparison between 
the prediction of the most probable worst day using a Gaussian model, or using 
the observed exponential character of the tail of the distribution, and the actual 
worst day observed the following year. It is clear that the Gaussian prediction is 
systematically over-optimistic. The exponential model leads to a number which is 
seven times out of 11 below the observed result, which is indeed the expected result 
(Fig. 3.1). 

Note finally that the measure of risk as a loss probability keeps its meaning even 
if the variance is infinite, as for a Lévy process. Suppose indeed that P, (6.x) decays 
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very slowly when 5x is very large, as: 


AP 
P, n) = aE a S13 
( ee jax|ite (3.13) 
with 44 < 2, such that (6x?} = oo. A” is the ‘tail amplitude’ of the distribution 
P,; A gives the order of magnitude of the probable values of 5x. The calculation 
of Avyar is immediate and leads to: 


Avan = APY". (3.14) 


which shows that in order to minimize the VaR one should minimize A*, indepen- 
dently of the probability level Pyar. 
Note however that in this case, the most probable loss level is not equal to Aya but 


to (1+ 1/)!/# Avar. The previous Gumbel case again corresponds formally to the limit 
[L > 00. 


As we have noted in Chapter 1, A" is actually the natural generalization of the 
variance in this case. Indeed, the Fourier transform P, (z) of Pr behaves, for small 
z, as exp(—b, AM |z|*) for 4 < 2 (by is a certain numerical factor), and simply as 
exp(—Drtz? /2) for uw > 2, where A? = Dt is precisely the variance of P-. 


3.1.3 Temporal aspects: drawdown and cumulated loss 
Worst low 


A first problem which arises is the following: we have defined P as the probability 
for the loss observed at the end of the period [kr, (k + 1)r] to be at least equal 
to A. However, in general, a worse loss still has been reached within this time 
interval. What is then the probability that the worst point reached within the interval 
(the ‘low’) is a least equal to A? The answer is easy for symmetrically distributed 
increments (we thus neglect the average return mt < A, which is justified for 
small enough time intervals): this probability is simply equal to 2P, 


P[Xio — Xop < —A] = 2P[Xa — Xop < —Al. (3.15) 


where X op is the value at the beginning of the interval (open), X., at the end (close) 
and Xj, the lowest value in the interval (low). The reason for this is that for each 
trajectory just reaching —A between kr and (k + 1)r, followed by a path which 
ends up above —A at the end of the period, there is a mirror path with precisely 
the same weight which reaches a ‘close’ value beyond —A.‘ This factor of 2 in 
cumulative probability is illustrated in Figure 3.3. Therefore, if one wants to take 
into account the possibility of further loss within the time interval of interest in 
4 In fact, this argument — which dates back to Bachelier himself (1900)! —assumes that the moment where the 


trajectory reaches the point —A can be precisely identified. This is not the case for a discontinuous process, 
for which the doubling of P is only an approximate result. 
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Fig. 3.3. Cumulative probability distribution of losses (Rank histogram) for the S&P 500 
for the period 1989-98. Shown is the daily loss (Xq = Xop)/ Xop (thin line, left axis) and 
the intraday loss (Xo — Xop)/ X op (thick line, right axis). Note that the nght axis is shifted 
downwards by a factor of two with respect to the left one, so in theory the two lines should 
fall on top of one another. 


Table 3.2. Average value of the absolute value of the open/close daily returns and 
maximum daily range (high-low) over open for S&P 500, DEM/$ and Bund. Note 
that the ratio of these two quantities is indeed close to 2. Data from 1989 to 1998 
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S&P 500 0.472% 0.924% 1.96 
DEM/$ 0.392% 0.804% 2.05 
Bund 0.250% 0.496% 1.98 


the computation of the VaR, one should simply divide by a factor 2 the probability 
level Pyar appearing in Eq. (3.9). 
A simple consequence of this ‘factor 2° rule is the following: the average value of 
the maximal excursion of the price during time t, given by the high minus the low 
over that period, is equal to twice the average of the absolute value of the variation 
from the beginning to the end of the period ({|open—close|)). This relation can be 
tested on real data; the corresponding empirical factor is reported in Table 3.2. 


Cumulated losses 


Another very important aspect of the problem is to understand how losses can 
accumulate over successive time periods. For example, a bad day can be followed 
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by several other bad days. leading to a large overall loss. One would thus like to 
estimate the most probable value of the worst week, month, etc. In other words, 
one would like to construct the graph of Ayar( NT), for a fixed overall investment 
horizon 7. 

The answer is straightforward in the case where the price increments are 
independent random variables. When the elementary distribution P, is Gaussian, 
Pyx is also Gaussian, with a variance multiplied by N. At the same time, the 
number of different intervals of size Nr for a fixed investment horizon T decreases 
by a factor N. For large enough T, one then finds: 


| T 
Avar(NT)|7 ~ oxo, {2N log (sax): (3.16) 
JUINT 


where the notation |; means that the investment period is fixed.> The main effect 
is then the //N increase of the volatility, up to a small logarithmic correction. 

The case where P, (5x) decreases as a power-law has been discussed in Chapter 
1: for any N, the far tail remains a power-law (that is progressively ‘eaten up’ by 
the Gaussian central part of the distribution if the exponent y of the power-law is 
greater than 2, but keeps its integrity whenever jz < 2). For finite N, the largest 
moves will always be described by the power-law tail. Its amplitude A“ is simply 
multiplied by a factor N (cf. Section 1.5.2). Since the number of independent 
intervals is divided by the same factor N, one finds:® 


i 
NT\* 
Aver(NT)\7 = A ea. ‘ (3.17) 


independently of N. Note however that for jz > 2, the above result is only valid if 
Avyar(NT) is located outside of the Gaussian central part, the size of which growing 
as o./N (Fig. 3.4). In the opposite case, the Gaussian formula (3.16) should be 
used. ; 

One can of course ask a slightly different question, by fixing not the investment 
horizon 7 but rather the probability of occurrence. This amounts to multiplying 
both the period over which the loss is measured and the investment horizon by the 
same factor N. In this case, one finds that: - 


Avan(Nt) = N® Aygp(t). (3.18) 


The growth of the value-at-risk with N is thus faster for small (4, as expected, The 
case of Gaussian fluctuations corresponds to jz = 2. 


5 One can also take into account the average return, m; = (5x). In this case, one must subtract to Ayvar(Nr)iz 
the quantity —mN (the potential losses are indeed smaller if the average retum is positive). 
® of. previous footnote. 
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Fig. 3.4. Distribution of cumulated losses for finite N: the central region, of width 
~oN'/?, is Gaussian. The tails, however, remember the specific nature of the elementary 
distribution P, (6x), and are in general fatter than Gaussian. The point is then to determine 
whether the confidence level Pyar puts Avar(NT) within the Gaussian part or far in the 
tails. 


Drawdowns 


One can finally analyse the amplitude of a cumulated loss over a period that is not a 
priori limited. More precisely, the question is the following: knowing that the price 
of the asset today is equal to xp, what is the probability that the lowest price ever 
reached in the future will be x, in; and how long such a drawdown will last? This 
‘is a classical problem in probability theory [Feller, vol. II, p. 404]. Let us denote 
as Amax = Xo — Xmin thé maximum amplitude of the loss. If P; (6x) decays at least 
exponentially when 6x — —oo, the result is that the tail of the distribution of Ajax 
behaves as an exponential: 


Amax 
» P(Amax) ge a exp (- ) ? (3.19) 


max > OO Ag 


-where Ag > 0 is the finite solution of the following equation: 


0 


[ow (-=) P, (dx) dbx = 1. (3.20) 


Note that for this last equation to be meaningful, P,(d.x) should decrease at least 
as exp(—|dx|/Ag) for large negative 5x’s. It is clear (see below) that if P; (6x) 
decreases as a power-law, say, the distribution of cumulated losses cannot decay 
exponentially, since it would then decay faster than that of individual losses! 


102 Extreme risks and optimal portfolios 


ft is interesting to show how this result can be obtained. Let us introduce the cumulative 
distribution Ps (A) = aps P( Ajax) dAmax. The first step of the walk, of size 8x can either 


exceed —A, or stay above it. In the first case, the level A is immediately reached, In 
the second case, one recovers the very same problem as initially, but with —A shifted to 
—A — dx. Therefore, P- (A) obeys the following equation: 


-A +00 
PA) = [ Pe(ax) dbx + | P,(&x)P.(A + 8x) ddx. (3.21) 


xo —A 


if one can neglect the first term in the right-hand side for large A (which should be self- 
consistently checked), then, asymptotically, P(A) should obey the following equation: 


+00 
P(A) = / P,(5x)P3(A + 6x) dbx. (3.22) 
—A 


Inserting an exponential shape for P(A) then leads to Eq. (3.20) for Ag, in the limit 
A > &. This result is however only valid if P, (8x) decays sufficiently fast for large 
negative 5x’s, such that Eg. (3.20) has a non-trivial solution. , 

Let us study two simple cases: 


e For Gaussian fluctuations, one has: 


1 5x — mt)? 
P,(6x) = a exp ¢ ae ) : (3.23) 
Equation (3.20) thus becomes: 
m D D 
“= te an (3.24) 


Ap gives the order of magnitude of the worst drawdown. One can understand 
the above result as follows: Ao is the amplitude of the probable fluctuation over 
the characteristic time scale T = D/m?* introduced above. By definition, for 
times shorter than 7, the average return m is negligible. Therefore, one has: 
Ag «VDF = D/m. 

If m = 0, the very idea of worst drawdown loses its meaning: if one waits a 
long enough time, the price can then reach arbitrarily low values. It is natural 
to introduce a quality factor Q that compares the average return m; = mt 


to the amplitude of the worst drawdown Ag. One thus has Q = m,/Ao-= . 


2m?t/D = 2t/T. The larger the quality factor, the smaller the time needed 
to end a drawdown period. 


e The case of exponential tails is important in practice (cf. Chapter 2). Choosing 


for simplicity P, (Sx) = (2a)7! exp(—a|dx — mt]), the equation for Ao is found © 


to be: 


Es mt\ _ | 
7 Exp A = 1, (3.25) 


3.1 Risk measurement and diversification 103 


In the limit where ta < 1 (i.e. that the potential losses over the time interval 
tT are much larger than the average return over the same period), one finds: Ag = 
1/mta?. 

e More generally, if the width of P, (x) is much smaller than Ag, one can expand 
the exponential and find that the above equation —(n/ Apo) + (D/2A3) = Ois 
still valid. 


How long will these drawdowns last? The answer can only be statistical. The 
probability for the time of first return 7 to the initial investment level x, which 
one can take as the definition of a drawdown (although it could of course be 
immediately followed by a second drawdown), has, for large T, the following form: 


1/2 
. P(L)'= a exp(—=), (3.26) 
The probability for a drawdown to last much longer than T is thus very small. 
In this sense, T appears as the characteristic drawdown time. Note that it is nor 
equal to the average drawdown time, which is on the order of Vet , and thus 
much smaller than 7. This is related to the fact that short drawdowns have a large 
probability: as t decreases, a large number of drawdowns of order t appear, thereby 
reducing their average size. 


3.1.4 Diversification and utility — satisfaction thresholds 


It is intuitively clear that one should not put all his eggs in the same basket. A 
diversified portfolio, composed of different assets with small mutual correlations, 
is less risky because the gains of some of the assets more or less compensate the 
loss of the others. Now, an investment with a small risk and small return must 
sometimes be preferred to a high yield, but very risky, investment. 

The theoretical justification for the idea of diversification comes again from the 
CLT. A portfolio made up of M uncorrelated assets and of equal volatility, with 
weight 1/M, has an overall volatility reduced by a factor JM. Correspondingly, 
the amplitude (and duration) of the worst drawdown is divided by a factor M (cf. 
Eq. (3.24)), which is obviously satisfying. 

This qualitative idea stumbles over several difficulties. First of all, the fluctu- 
ations of financial assets are in general strongly correlated; this substantially de- 
creases the possibility of true diversification, and requires a suitable theory to deal 
with these correlations and construct an ‘optimal’ portfolio: this is Markowitz’s 
theory, to be detailed below. Furthermore, since price fluctuations can be strongly 
non-Gaussian, a volatility-based measure of risk might be unadapted: one should 
rather try to minimize the value-at-risk of the portfolio. It is therefore interesting to 
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look for an extension of the classical formalism, allowing one to devise minimum 
VaR portfolios. This will be presented in the next sections. 

Now, one is immediately confronted with the problem of defining properly an 
‘optimal’ portfolio. Usually, one invokes the rather abstract concept of ‘utility 
functions’, on which we shall briefly comment in this section, in particular to show 
that it does not naturally accommodate for the notion of value-at-risk. 

We will call W; the wealth of a given operator at time t = T. If one 
argues that the level of satisfaction of this operator is quantified by a certain 
function of Wr only,’ which one usually calls the ‘utility function’ U(Wr). This 
function is furthermore taken to be continuous and even twice differentiable. The 
postulated ‘rational’ behaviour for the operator is then to look for investments 
which maximize his expected utility, averaged over all possible histories of price 
changes: 


(U(Wr)) = / P(Wr)U(Wr) dW. (3.27) 


The utility function should be non-decreasing: a larger profit is clearly always more 
satisfying. One can furthermore consider the case where the distribution P(Wr) is 
sharply peaked around its mean value (Wr) = Wo + mT. Performing a Taylor 
expansion of (U(Wy)) around U(Wy + mT) to second order, one deduces that the 
utility function must be such that: 


au 


se 0: 
dw? < (3.28) 


This property reflects the fact that for the same average return, a less risky 
investment should always be preferred. 

A simple example of utility function compatible with the above constraints is the 
exponential function U(W;) = — exp[—Wry/wo]. Note that wo has the dimensions 
of a wealth, thereby fixing a wealth scale in the problem. A natural candidate is the 
initial wealth of the operator, wo ox Wo. If P(Wr) is Gaussian, of mean Wp + mT 
and variance DT, one finds that the expected utility is given by: 


Ww je “é 
(U) = ~exp |-= hk |. (3.29) 


One could think of constructing a utility function with no intrinsic wealth scale 
by choosing a power-law: U(Wr) = (Wy /wo)* with a < 1 to ensure the correct 
convexity. Indeed, in this case a change of wo can be reabsorbed ina change of scale 


" But not of the whole “history” of his wealth between t = 0 and T. One thus assumes that the Operator 


1S Insensitive to what can happen between these two dates; this is not very realistic. One could however- 


generalize the concept of utility function and deal with utility functionals U({W()}o<;<7). 
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Fig. 3.5. Example of a ‘utility function’ with thresholds, where the utility function is non- 
continuous. These thresholds correspond to special values for the profit, or the loss, and 
are often of purely psychological origin. 


of the utility function itself. However, this definition cannot allow for negative final 


wealths, and is thus problematic. 

Despite the fact that these postulates sound reasonable, and despite the very large 
number of academic studies based on the concept of utility function, this axiomatic 
approach suffers from a certain number of fundamental flaws. For example, it is not 
clear that one could ever measure the utility function used by a given agent on the 
markets.® The theoretical results are thus: 


e Either relatively weak, because independent of the special form of the utility 
function, and only based on its general properties. 
e Orrather arbitrary, because based on a specific, but unjustified, form for U(W,). 


On the other hand, the idea that the utility function is regular is probably not 
always realistic. The satisfaction of an operator is often governed by rather sharp 
thresholds, separated by regions of indifference (Fig. 3.5). For example, one can 


* be ina situation where a specific project can only be achieved if the profit AW = 


Wr — Wp exceeds a certain amount. Symmetrically, the clients of a fund manager 
will take their money away as soon as the losses exceed a certain value: this is 
the strategy of ‘stop-losses’, which fix a level for acceptable losses, beyond which 
the position is closed. The existence of option markets (which allow one to limit 
8 Even the idea that an operator would really optimize his expected utility, and not take decisions partly based on 


*non-rational’ arguments, is far from being obvious. On this point, see: M. Marsili, Y. C. Zhang, Fluctuations 
around Nash equilibria in Game Theory, Physica A245, 181 (1997). 
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the potential losses below a certain level—see Chapter 4), or of items the price 
of which is $99 rather than $100, are concrete examples of the existence of these 
thresholds where ‘satisfaction’ changes abruptly. Therefore, the utility function U 
is not necessarily continuous. This remark is actually intimately related to the fact 
that the value-at-risk is often a better measure of risk than the variance, Let us 
indeed assume that the operator is ‘happy’ if AW > —A and ‘unhappy’ whenever 
AW < —A. This corresponds formally to the following utility function: 


U,; (AW > —A) 
U,(AW) = 
abew) U, (AW <—A) wae 
with U,-U, <0. 
The expected utility is then simply related to the loss probability: 
-A 
(Ux) = U+U2— uy) | P(AW)dAW 
—o 
= U;-—|U,—-U,|P. (3.31) 


Therefore, optimizing the expected utility is in this case tantamount to minimizing 
the probability of losing more that A. Despite this rather appealing property, which 
certainly corresponds to the behaviour of some market operators, the function 
U,4(AW) does not Satisfy the above criteria (continuity and negative curvature), 

Confronted to the problem of choosing between risk (as measured by the 
variance) and return, another very natural strategy (for those not acquainted with 
utility functions) would be to compare the average return to the potential loss 
VDT. This can thus be thought of as defining a risk-corrected, ‘pessimistic’ 
estimate of the profit, as: 


mT = mT ~)VDT., (3.32) 


Where A is an arbitrary coefficient that measures the pessimism (or the risk 
aversion) of the operator. A rather natural procedure would then be to look for 
the optimal portfolio which maximizes the risk corrected return m a. However; this 
optimal portfolio cannot be obtained using the standard utility function formalism. 
For example, Eq, (3.29) shows that the object which should be maximized is not 
mT —A/DT but rather mT — DT /2w. This amounts to comparing the average 
profit to the square of the potential losses, divided by the reference wealth scale wy, 
a4 quantity that depends a priori on the operator.? On the other hand, the quantity 
mT —i/DT is directly related (at least in a Gaussian world) to the value-at-risk 
Aver, cf. Eq. (3.16). 


This comparison is actually meaningful, since it Corresponds to comparing the reference wealth wg to the 
order of magnitude of the worst drawdown D/?m, cf. Eq. (3.24). 
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This can be expressed slightly differently: a reasonable objective could be to 
maximize the value of the “probable gain’ G,, such that the probability of earning 
more is equal to a certain probability p:'° 


+00 
i P(AW)dAW = p. (3.33) 


P 


In the case where P(AW) is Gaussian, this amounts to maximizing m7 — ASD, 
where A is related to p ina simple manner. Now, one can show that it is impossible 
to construct a utility function such that, in the general case, different strategies 
can be ordered according to their probable gain G,. Therefore, the concepts of 
loss probability, value-at-risk or probable gain cannot be accommodated naturally 
within the framework of utility functions. Still, the idea that the quantity which 
is of most concern and that should be optimized is the value-at-risk sounds 
perfectly rational.-This is at least the conceptual choice that we make in the present 
monograph. 


3.1.5 Conclusion 


Let us now recapitulate the main points of this section: 


¢ The usual measure of risk through a Gaussian volatility is not always adapted to 
the real world. The tails of the distributions, where the large events lie, are very 
badly described by a Gaussian law: this leads to a systematic underestimation 
of the extreme risks. Sometimes, the measurement of the volatility on historical 
data is difficult, precisely because of the presence of these large fluctuations. 

e The measure of risk through the probability of loss, or the value-at-risk, on the 
other hand, precisely focuses on the tails. Extreme events are considered as the 
true source of risk, whereas the small fluctuations contribute to the ‘centre’ of 
the distributions (arid contribute to the volatility) can be seen as a background 
noise, inherent to the very activity of financial markets, but not relevant for risk 
assessment. 

e From a theoretical point of view, this definition of risk (based on extreme 
events) does not easily fit into the classical ‘utility function’ framework. The 
minimization of a loss probability rather assumes that there exists well-defined 
thresholds (possibly different for each operator) where the ‘utility function’ 
is discontinuous.'' The concept of ‘value-at-risk’, or probable gain, cannot be 
naturally dealt with by using utility functions. 

10 Maximizing Gp, is thus equivalent to minimizing Ayap such that Pyzr = 1 — p. 


tis possible that the presence of these thresholds actually plays an important role in the fact that the price 
fluctuations are strongly non-Gaussian. 
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3.2 Portfolios of uncorrelated assets 


The aim of this section is to explain, in the very simple case where all assets that 
can be mixed in a portfolio are uncorrelated, how the trade-off between risk and 
return can be dealt with. (The case where some correlations between the asset 
fluctuations exist will be considered in the next section). On thus considers a set of 
M different risky assets X;,i = 1,..., M and one risk-less asset Xp. The number 
of asset i in the portfolio is n;, and its present value is x If the total wealth to 
be invested in the portfolio is W, then the m;’s are constrained to be such that 
Sil nix? = W. We shall rather use the weight of asset i in the portfolio, defined 
as: pj = n;x)/W, which therefore must be normalized to one: )°”, p; = 1. The 
p's can be negative (short positions). The value of the portfolio at time 7 is given 
by: S = Mo njxi(T) = WO", pixi(T)/x®. In the following, we will set the 
initial wealth W to 1, and redefine each asset i in such a way that all initial prices 
are equal to x = |. (Therefore, the average return m; and variance D; that we wiil 
consider below must be understood as relative, rather than absolute.) 

One furthermore assumes that the average return m; is known. This hypothesis 
is actually very strong, since it assumes for example that past returns can be used 
as estimators of future returns, i.e. that time series are to some extent stationary. 
However, this is very far from the truth: the life of a company (in particular 
high-tech ones) is very clearly non-stationary; a whole sector of activity can be 
booming or collapsing, depending upon global factors, not graspable within a 
purely statistical framework. Furthermore, the markets themselves evolve with 
time, and it is clear that some statistical parameters do depend on time, and 
have significantly shifted over the past 20 years. This means that the empirical 
determination of the average return is difficult: volatilities are such that at least 
several years are needed to obtain a reasonable signal-to-noise ratio, this time must 
indeed be large compared to the ‘security time’ 7. But as discussed above, several 
years is also the time scale over which the intrinsically non-stationary nature of the 
markets starts being important. 

One should thus rather understand m; as an ‘expected’ (or anticipated) future 
return, which includes some extra information (or intuition) available to the 
investor. These m;’s can therefore vary from one investor to the next. The relevant 
question is then to determine the composition of an optimal portfolio compatible 
with the information contained in the knowledge of the different m,’s. 

The determination of the risk parameters is a priori subject to the same caveat. 


We have actually seen in Section 2.7 that the empirical determination of the corre- 


lation matrices contains a large amount of noise, which blur the true information. 
However, the statistical nature of the fluctuations seems to be more robust in time 
than the average returns. The analysis of past price changes distributions appears 
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to be, to a certain extent, predictive for future price fluctuations. It, however. 
sometimes happens that the correlations between two assets change abruptly. 


3.2.1 Uncorrelated Gaussian assets 


Let us suppose that the variation of the value of the ith asset X; over the time 
interval T is Gaussian, centred around m,T and of variance D;T. The portfolio 
P = {Po. pi, ---. Pa} as a whole also obeys Gaussian statistics (since the Gaussian 
is stable). The average return m, of the portfolio is given by: 


M M 
Mp = x. Pim; = mo + > P (m; — mo), (3.34) 
i=0 i=! 
where we have used the constraint )>"y p; = 1 to introduce the excess return 


m; — mo, as compared to the risk-free asset (i = 0). If the X;’s are all independent, 
the total variance of the portfolio is given by: 


M 
D,= >" pf Di. (3.35) 


i=l 


(since Do is zero by assumption). If one tries to minimize the variance without any 
constraint on the average return m,, one obviously finds the trivial solution where 
all the weight is concentrated on the risk-free asset: 


On the opposite, the maximization of the return without any risk constraint leads 
to a full concentration of the portfolio on the asset with the highest return. 

More realistically, one can look for a tradeoff between risk and return, by 
imposing a certain average return m,, and by looking for the less risky portfolio 
(for Gaussian assets, risk and variance are identical). This can be achieved by 
introducing a Lagrange multiplier in order to enforce the constraint on the average 
return: 

d(Dp — mz) 


=0 (#0). (3.37) 
Opi Pi=P} 


while the weight of the risk-free asset pj is determined via the equation }°; py = 
1. The value of ¢ is ultimately fixed such that the average value of the return is 
precisely m,. Therefore: 


2p} D; = ¢(m; — mo), (3.38) 
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Fig. 3.6. ‘Efficient frontier’ in the return/risk plane m p» Dp. In the absence of constraints, 


this line is a parabola (dark line). If some constraints are imposed (for example, that all the 
weights p; should be positive), the boundary moves downwards (dotted line). 


and the equation for ¢: 


£4 (mi — mo)? 
m, ome ; a (3.39) 


The variance of this optimal portfolio is therefore given by: 


a 8 Avon; — m9? 
Ee. Cee ee 
Pp 4 D, F (3.40) 


t=! 


The case £ = 0 corresponds to the risk-free portfolio pj = 1. The set of all optimal 
portfolios is thus described by the parameter ¢, and define a parabola in the m p Dp 
plane (compare the last two equations below, and see Fig. 3.6). This line is called 
the ‘efficient frontier’; all portfolios must lie above this line. Finally, the Lagrange 
multiplier ¢ itself has a direct interpretation: Equation (3.24) tells us that the worst 
drawdown is of order D7,/2m, which is, using the above equations, equal tof /4. 

The case where the portfolio only contains risky assets (i.e. pp = 0) can be 
treated in a similar fashion. One introduces a second Lagrange multiplier ¢’ to deal 
with the normalization constraint So p; = 1. Therefore, one finds: 


» SMI +o 
| op, * 


The least risky portfolio corresponds to the one such that ¢ = 0 (no constraint on 


(3.41) 
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the average return): 


Paes Za). (3.42) 


Its total variance is given by D> = 1/Z. If all the D;’s are of the same order of 
magnitude, one has Z ~ M/D; therefore, one finds the result, expected from the 
CLT, that the variance of the portfolio is M times smaller than the variance of the 
individual assets. 

In practice, one often adds extra constraints to the weights p? in the form of 
linear inequalities, such as p? = 0 (no short positions). The solution is then 
more involved, but is still unique. Geometrically, this amounts to looking for the 
restriction of a paraboloid to an hyperplane, which remains a paraboloid. The 
efficient border is then shifted downwards (Fig. 3.6). A much richer case is when 
the constraint is non-linear. For example, on futures markets, margin calls require 
that a certain amount of money is left as a deposit, whether the position is long 
(pi; > 0) or short (p; < 0). One can then impose a leverage constraint, such 


. that eal |pi| = f, where f is the fraction of wealth invested as a deposit. This 


constraint leads to a much more complex problem, similar to the one encountered 
in hard optimization problems, where an exponentially large (in M) number of 
quasi-degenerate solutions can be found.!? 


Effective asset number in a portfolio 


It is useful to introduce an objective way to measure the diversification, or the asset 
concentration, in a given portfolio. Once such an indicator is available, one can 
actually use it as a constraint to construct portfolios with a minimum degree of 
diversification. Consider the quantity Y2 defined as: 


i=1 


M 
Re @p* (3.43) 


If a subset M’ < M of all p; are equal to 1/M’, while the others are zero, one 


- finds Y2 = 1/M‘. More generally, Y2 represents the average weight of an asset in 


the portfolio, since it is constructed as the average of p} itself. It is thus natural 
to define the ‘effective’ number of assets in the portfolio as Mz = 1/ ¥2. In order 
to avoid an overconcentration of the portfolio on very few assets (a problem often 
encountered in practice), one can look for the optimal portfolio with a given value 
for ¥;. This amounts to introducing another Lagrange multiplier ¢", associated to 


'2 On this point. see $. Galluccio, J.-P. Bouchaud. M. Potters. Portfolio optimisation, spin-glasses and random 
matrix theory, Physica, A259, 449 (1998). and references therein, 
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Fig. 3.7. Example of a standard efficient border ¢ = 0 (thick line) with four risky assets. 
If one imposes that the effective number of assets is equal to 2, one finds the sub-efficient 
border drawn in dotted line, which touches the efficient border at r), r2. The inset shows 
the effective asset number of the unconstrained optimal portfolio (¢” = 0) as a function of 
average return. The optimal portfolio satisfying Mer => 2 is therefore given by the standard 
portfolio for returns between r; and rz and by the Mr = 2 portfolios otherwise. 


Y2. The equation for p* then becomes: 


«_, Smrto' 
ar ETS is 
An example of the modified efficient border is given in Figure 3.7. 
More generally, one could have considered the quantity Y, defined as: 
M 
¥, = >“ @ps, (3.45) 
i=! 


« 


and used it to define the effective number of assets via Y, = M - 7. It is interesting 


c 
to note that the quantity of missing information (or entropy) Z associated to the 


very choice of the p}’s is related to Y, when q — 1. Indeed, one has: 


(3.46) 


q=! 


Approximating Y, as a function of g by a straight line thus leads to J ~ —Y. 


‘ 
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3.2.2 Uncorrelated ‘power-law’ assets 


As we have already underlined, the tails of the distributions are often non-Gaussian. 
In this case, the minimization of the variance is not necessarily equivalent to an 
optimal control of the large fluctuations. The case where these distribution tails 
are power-laws is interesting because one can then explicitly solve the problem of 
the minimization of the value-at-risk of the full portfolio. Let us thus assume that 
the fluctuations of each asset X; are described, in the region of large losses, by a 
probability density that decays as a power-law: 


pay 


Priga), —— \Sx; Jie" 


(3.47) 


with.an arbitrary exponent j2, restricted however to be larger than 1, such that the 
average return is well defined. (The distinction between the cases 4 < 2, for which 
the variance diverges, and jz > 2 will be considered below). The coefficient A; 
provides an order of magnitude for the extreme losses associated with the asset i 
(cf. Eq. (3.14)). 

As we have mentioned in Section 1.5.2, the power-law tails are interesting 
because they are stable upon addition: the tail amplitudes A#’ (that generalize the 
variance) simply add to describe the far-tail of the distribution of the sum. Using 
the results of a ppmicet C, one can show that if the asset X; is characterized by a 
tail amplitude A“, the quantity p;X; has a tail amplitude equal to p/’ A‘. The tail 
amplitude of the global portfolio p is thus given by: 


M 
= Tan 
At =} p, A 
i=1 


and the probability that the loss exceeds a certain level A is given by P = At / A”. 
Hence, independently of the chosen loss level A, the minimization of the loss 
probability P requires the minimization of the tail amplitude A‘; the optimal 
portfolio is therefore independent of A. (This is nor true in general: see Section 
3.2.4.) The minimization of A” for a fixed average return nr, leads to the following 
equations (valid if 4 > 1): 


(3.48) 


upi | at = E(m; — mo), (3.49) 


with an equation to fix ¢: 


( pe a mM; = may 


i=l a at 


(3.50) 


=Mp— Mo. 
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The optimal loss probability is then given by: 


me (lr 53 neal (3.51) 
i=l ar 

Therefore, the concept of ‘efficient border’ is still valid in this case: in the plane 
return/probability of loss, it is similar to the dotted line of Figure 3.6. Eliminating 
¢ from the above two equations, one finds that the shape of this line is given by 
P* x (my, — mo)". The parabola is recovered in the limit x = 2. 

In the case where the risk-free asset cannot be included in the portfolio, the 
optimal portfolio which minimizes extreme risks with no constraint on the average 
return is given by: 


1 ey tik: 
p=— — »- Ae, (3.52) 
2a j=) 
and the corresponding loss probability is equal to: 
1 
Pm — Ze, (3.53) 


If all assets have comparable tail amplitudes 4A; A, one finds that Z ~ 
MA~“/““-l), Therefore, the probability of large losses for the optimal portfolio 
is a factor M“~! smaller than the individual probability of loss. 

Note again that this result is only valid if u > 1. If 2 < 1, one finds that the risk increases 
with the number of assets M. In this case, when the number of assets is increased, the 
probability of an unfavourable event also increases — indeed, for 4 < \ this largest event 
is so large that it dominates over all the others. The minimization of risk in this case leads 
10 Pisyy = 1, where imin is the least risky asset, in the sense that Arn = min{ At i: 


One should now distinguish the cases yz < 2 and jz > 2. Despite the fact that 
the asymptotic power-law behaviour is stable under addition for all values of ju, 
the tail is progressively ‘eaten up’ by the centre of the distribution for jz > 2, since 
the CLT applies. Only when jz < 2 does this tail remain untouched. We thus again 
recover the arguments of Section 1.6.4, already encountered when we discussed 
the time dependence of the VaR. One should therefore distinguish two cases: if 
D, is the variance of the portfolio p (which is finite if 4 > 2), the distribution 
of the changes 5S of the value of the portfolio p is approximately Gaussian if 
\8S| < \/D,T log(M), and becomes a power-law with a tail amplitude given by 
A‘, beyond this point. The question is thus whether the loss level A that one wishes 
to reontral is smaller or larger than this crossover value: 


e If A < \/D,T log(M), the minimization of the VaR becomes equivalent to 
the minimization of the variance, and one recovers the Markowitz procedure 
explained in the previous paragraph in the case of Gaussian assets. 
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e If on the contrary A > \/D,T log(M), then the formulae established in the 
present section are valid even when jz > 2. 


Note that the growth of \/log(M) with M is so slow that the Gaussian CLT is 
not of great help in the present case. The optimal portfolios in terms of the VaR are 
not those with the minimal variance, and vice versa. 


3.2.3 ‘Exponential’ assets 


Suppose now that the distribution of price variations is a symmetric exponential 
around a zero mean value (m; = 0): 


P(8x;) = sy exp[—a 15x; |]. (3.54) 


where ay! gives the order of magnitude of the fluctuations of X; (more precisely, 
V2/ a; is the RMS of the fluctuations.). The variations of the full portfolio p, 
defined as 5S = pee! | pi5x;, are distributed according to: 


exp(izd S) z 
P(dS ae ee eel (3.55) 
ar ed (Ei ken yy 


where we have used Eq. (1.50) and the fact that the Fourier transform of the 
exponential distribution Eq. (3.54) is given by: 


=> (3.56) 


Now, using the method of residues, it is simple to establish the following expression 
for P(8S) (valid whenever the a; / p; are all different): 


] a; 
——|6S|}. G57 
yas (1 — [(Pj04)/(P.,)P) exp | a ] (3.57) 


P(8S) = see 7 


The probability for extreme losses is thus equal to: 


l ° 
PbS < A) =. ee (i 2 [(pa")/a, 2) exp[—a’* A], (3.58) 
where a@* is equal to the smallest of all ratios @; / p;, and i* the corresponding value 
of i. The order of magnitude of the extreme losses is therefore given by 1/a*. This 
is then the quantity to be minimized in a value-at-risk context. This amounts to 
choosing the p;’s such that min; {q;/p;} is as large as possible. 
This minimization problem can be solved using the following trick. One can 
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write formally that: 


I Pi ey pf : 
= —1 = li Th 4 3.59 
5 max {2 PS. (x fr) ( ) 


This equality is true because in the limit 4 — om, the largest term of the sum 
dominates over all the others. (The choice of the notation jz is on purpose: see 
below). For jx large but fixed, one can perform the minimization with respect to the 
p;’s, using a Lagrange multiplier to enforce normalization. One finds that: 

pe x al, (3.60) 


In the limit 2 > oo, and imposing ball pi = 1, one finally obtains: 


a; 


i=. (3.61) 


M 
i=l a; 


which leads to a@* = pia a;. In this case, however, all a;/p* are equal to 
a* and the result Eq. (3.57) must be slightly altered. However, the asymptotic 
exponential fall-off, governed by a, is still true (up to polynomial corrections: 
cf. Section 1.6.4). One again finds that if all the w;’s are comparable, the potential 
losses, measured through | /a@*, are divided by a factor M. 

Note that the optimal weights are such that p? o a; if one tries to minimize 
the probability of extreme losses, whereas one would have found p? « a? if the 
goal was to minimize the variance of the portfolio, corresponding to uw = 2 (cf. 
Eq. (3.42)). In other words, this is an explicit example where one can see that 
minimizing the variance actually increases the value-at-risk. 


Formally, as we have noticed in Section 1.3.4, the exponential distribution corresponds 
to the limit 4 — 00 of a power-law distribution: an exponential decay is indeed more 
rapid than any power-law. Technically, we have indeed established in this section that the 
minimization of extreme risks in the exponential case is identical to the one obtained in the 
previous section in the limit 2 — 00 (see Eq. (3.52)). 


3.2.4 General case: optimal portfolio and VaR (*) 


In all of the cases treated above, the optimal portfolio is found to be independent of 
the chosen loss level A. For example, in the case of assets with power-law tails, the 
minimization of the loss probability amounts to minimizing the tail amplitude A“, 
independently of A. This property is however not true in general, and the optimal 
portfolio does indeed depend on the risk level A, or, equivalently, on the temporal 
horizon over which risk must be ‘tamed’. Let us for example consider the case 
where all assets are power-law distributed, but with a tail index jz; that depends on 
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the asset X;. The probability that the portfolio p experiences of loss greater than A 
is given, for large values of A, by:'* 


He 


M 
A 
P(8S < —A) =) pi. (3.62) 
i=1 


Looking for the set of p;’s which minimizes the above expression (without 
constraint on the average return) then leads to: 


i x a cee Mf Al I 
ae oa) > Ge) — 


This example shows that in the general case, the weights p* explicitly depend on 
the risk level A. \f all the yz;’s are equal, A“' factors out and disappears from the 


. 


Pj S8- 


Another interesting case is that of weakly non-Gaussian assets, such that the first 
correction to the Gaussian distribution (proportional to the kurtosis «; of the asset Xj) 
is enough to describe faithfully the non-Gaussian effects. The variance of the full portfolio 
is given by Dp = pws! | p? Dj while the kurtosis is equal to: Kp = Yj, pt D?«; / Ds. The 
probability that the portfolio plummets by an amount larger than A is therefore given by: 


A 
PE 26 = Pa. a al 3.64 
i li ( | 4! (35) aiaiad 


where Pg. is related to the error function (cf. Section 1.6.3) and 
exp(—u?/2) 
V2 


To first order in kurtosis, one thus finds that the optimal weights p* (without fixing the 
average return) are given by: 


oe MP 
-=———— A ; 3.66 
2D DF ( a) iat 


h(u) = (uw? ~ 3u). (3.65) 


where h is another function, positive for large arguments, and &' is fixed by the condition 
pee Pi = |. Hence, the optimal weights do depend on the risk level A, via the kurtosis 
of the distribution. Furthermore, as could be expected, the minimization of extreme risks 
leads to a reduction of the weights of the assets with a large kurtosis k;. 


3.3 Portfolios of correlated assets 


The aim of the previous section was to introduce, in a somewhat simplified 
context, the most important ideas underlying portfolio optimization, in a Gaussian 
world (where the variance is minimized) or in a non-Gaussian world (where 


'3 The following expression is valid only when the subleading corrections to Eq. (3.47) can safely be neglected. 
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the quantity of interest is the value-at-risk). In reality, the fluctuations of the 
different assets are often strongly correlated (or anti-correlated), For example, an 
increase of short-term interest rates often leads to a drop in share prices. All the 
stocks of the New York Stock Exchange behave, to a certain extent, similarly. 
These correlations of course modify completely the composition of the optimal 
portfolios, and actually make diversification more difficult. In a sense, the number 
of effectively independent assets is decreased from the true number of assets M. 


3.3.1 Correlated Gaussian fluctuations 


Let us first consider the case where all the fluctuations dx; of the assets X; are 
Gaussian, but with arbitrary correlations. These correlations are described in terms 
of a (symmetric) correlation matrix C;;, defined as: 


Cj; = (8x;5x;) — mym;. (3.67) 


This means that the joint distribution of all the fluctuations 6x,, 5x2, ..., dxXyq is 
given by: 


P(8x1, 8x, ..., 8x) & exp -5 Sx; — mj(C7)ij (6x; — mj), GB.68) 
7] 
where the proportionality factor is fixed by normalization and is equal to 
1// (2m)% det C, and (C~'); ; denotes the elements of the matrix inverse of C. 
An important property of correlated Gaussian variables is that they can be 
decomposed into a weighted sum of independent Gaussian variables e,, of mean 
zero and variance equal to D,: 


M 

bx; = mj; + y Oiaea (€aep) = 8a,pDa. (3.69) 
a=! ‘ 

The {e,} are usually referred to as the ‘explicative factors’ (or principal com- 

ponents) for the asset fluctuations. They sometimes have a simple economic 


interpretation. 


— a a 
The coefficients O;, give the weight of the factor e, in the evolution of the asset 


X;. These can be related to the correlation matrix C;; by using the fact that the 
{e,}’s are independent. This leads to: 


M M . 
Cij = D> Oia Oj(eaes) =) OjaOjaDa, (3.70) 


a,b=1 a=] 


or, seen as a matrix equality: C = ODO‘, where O* denotes the matrix transposed 
of O and D the diagonal matrix obtained from the D,’s. This last expression shows 
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that the D,’s are the eigenvalues of the matrix C,;, whereas O is the orthogonal 
matrix allowing one to go trom the set of assets i’s to the set of explicative factors 
Ci. 

The fluctuations 5S of the global portfolio p are then also Gaussian (since 5S js a 
weighted sum of the Gaussian variables e,,), of mean m, = v4 pi(m; — mo) +mo 
and variance: 


M 
Dp = >> pipjCiy. (3.71) 
i,j=l 


The minimization of D, for a fixed value of the average return m, (and with the 
possibility of including the risk-free asset Xo) leads to an equation generalizing Eq. 
(3.38); 


M 
2° Cijpy = Sm; — mo), (3.72) 
j=l 
which can be inverted as: 
t M 
PI=5 DC's — mo). (3.73) 


This is Markowitz’s classical result (cf. (Markowitz, Elton and Gruber]). 
In the case where the risk-free asset is excluded, the minimum variance portfolio 
is given by: 


M M 
pr= 5 or Z= = CF. (3.74) 
i=l ij=l 

Actually, the decomposition, Eq. (3.69), shows that, provided one shifts to the 
basis where all assets are independent (through a linear combination of the original 
assets), all the results obtained above in the case where the correlations are absent 
(such as the existence of an efficient border, etc.) are still valid when correlations 
are present. : 

In the more general case of non-Gaussian assets of finite variance, the total 
variance of the portfolio is still given by: cae pip;Ci;, where Cj; is the 
correlation matrix. If the variance is an adequate measure of risk, the composition 
of the optimal portfolio is still given by Eqs (3.73) and (3.74). Let us however 
again emphasize that, as discussed in Section 2.7, the empirical determination of 
the correlation matrix C;; is difficult, in particular when one is concerned with the 
small eigenvalues of this matrix and their corresponding eigenvectors. 

The ideas developed in Section 2.7 can actually be used in practice to reduce the real risk 


of optimized portfolios. Since the eigenstates corresponding to the ‘noise band’ are not 
expected to contain real infarmation, one should not distinguish the different eigenvalues 
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and eigenvectors in this sector. This amounts to replacing the restriction of the empirical 
correlation matrix to the noise band subspace by the identity matrix with a coefficient such 
that the trace of the matrix is conserved (i.e. suppressing the measurement broadening 
due to a finite observation time). This ‘cleaned’ correlation matrix, where the noise has 
been (at least partially) removed, is then used to construct an optimal portfolio. We have 
implemented this idea in practice as follows. Using the same data sets as above, the 
total available period of time has been divided into two equal sub-periods. We determine 
the correlation matrix using the first sub-period, ‘clean’ it, and construct the family of 
optimal portfolios and the corresponding efficient frontiers. Here we assume that the 
investor has perfect predictions on the future average returns mj, i.e. we take for mj the 
observed return on the next sub-period. The results are shown in Figure 3.8: one sees very 
clearly that using the empirical correlation matrix leads to a dramatic underestimation 
of the real risk, by over-investing in artificially low-risk eigenvectors. The risk of the 
optimized portfolio obtained using a cleaned correlation matrix is more reliable, although 
the real risk is always larger than the predicted one. This comes from the fact that any 
amount of uncertainty in the correlation matrix produces, through the very optimization 
procedure, a bias towards low-risk portfolios. This can be checked by permuting the two 
sub-periods: one then finds nearly identical efficient frontiers. (This is expected, since for 
large correlation matrices these frontiers should be self-averaging.) In other words, even if 
the cleaned correlation matrices are more stable in time than the empirical correlation 
matrices, they are not perfectly reflecting future correlations. This might be due to a 
combination of remaining noise and of a genuine time dependence in the structure of the 
meaningful correlations. 


The CAPM and its limitations 


Within the above framework, all optimal portfolios are proportional to one another, 
that is, they only differ through the choice of the factor ¢. Since the problem is 
linear, this means that the linear superposition of optimal portfolios is still optimal. 
If all the agents on the market choose their portfolio using this optimization 
scheme (with the same values for the average return and the correlation coeffi- 
cients—clearly quite an absurd hypothesis), then the ‘market portfolio’ (i.e. the 
one obtained by taking all assets in proportion of their market capitalization) is an 
optimal portfolio. This remark is at the origin of the ‘CAPM’ (Capital Asset Pricing 
Model), which aims at relating the average return of an asset with its covariance 
with the ‘market portfolio’. Actually, for any optimal portfolio p, one can express 
Mp —mg in terms of the p*, and use Eq. (3.73) to eliminate ¢, to obtain the following 
equality: 


m; —mo = Bilmp—mo] Bi = (5 —m,)2) (3.75) 
P ; 


The covariance coefficient 8; is often called the ‘8’ of asset i when p is the market 
portfolio. 


This relation is however not true for other definitions of optimal portfolios. Let us define 
the generalized kurtosis K;;4; that measures the first correction to Gaussian statistics, from 
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Return 


Risk 


Fig. 3.8. Efficient frontiers from Markowitz optimization, in the return versus volatility 
plane. The leftmost dotted curve corresponds to the classical Markowitz case using the 
empirical correlation matrix. the rightmost short-dashed curve is the realization of the same 
portfolio in the second time period (the risk is underestimated by a factor of 3!). The 
central curves (plain and long-dashed) represent the case of a cleaned correlation matrix. 
The realized risk is now only a factor of 1.5 larger than the predicted risk. 


the joint distribution of the asset fluctuations: 


P (6x), 6x2....,6x4) = 


4 \" 
(3) J f-foe|=Daes-m 0.76 
i 


Ma 
“ys > Cuz, + ai > KijarZizjZezi + -- | I] dzj;. (3.77) 
ij , * ijkl f=! 


If one tries to minimize the probability that the loss is greater than a certain A, a 
generalization of the calculation presented above (cf. Eq. (3.66)) leads to: 


M 
pr= = Cj (mn; — mo) = 2h 


7%) 


M M 
x De 2 KijniCy Cy Cp (mye — mo) (my — mo)(my — mo). (3.78) 


where h is a certain positive function. The important point here is that the level of risk A 
appears explicitly in the choice of the optimal portfolio. If the different operators choose 
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different levels of risk, their optimal portfolios are no longer related by a proportionality 
factor, and the above CAPM relation does not hold. 


3.3.2 ‘Power-law’ fluctuations (*) 


The minimization of large risks also requires, as in the Gaussian case detailed 
above, the knowledge of the correlations between large events, and therefore an 
adapted measure of these correlations. Now, in the extreme case yp < 2, the 
covariance (as the variance), is infinite. A suitable generalization of the covariance 
is therefore necessary. Even in the case yz > 2, where this covariance. is a priori 
finite, the value of this covariance is a mix of the correlations between large 
negative moves, large positive moves, and all the ‘central’ (i.e. not so large) events. 
Again, the definition of a ‘tail covariance’, directly sensitive to the large negative 
events, is needed. The aim of the present section is to define such a quantity, which 
is a natural generalization of the covariance for power-law distributions, much 
as the ‘tail amplitude’ is a generalization of the variance. In a second part, the 
minimization of the value-at-risk of a portfolio will be discussed. 


‘Tail covariance’ 


Let us again assume that the 5x;’s are distributed according to: 


war 
6x3) =~ —. 3.79 
Pt tik Se [x;|!+# ( ) 
A natural way to describe the correlations between large events is to generalize the 
decomposition in independent factors used in the Gaussian case, Eq. (3.69) and to 


write: 
M 
bX; =m + s Opes (3.80) 
a=) 


where the @, are independent power-law random variables, the distribution of 
which is: 


Since power-law variables are (asymptotically) stable under addition, the decom- 
position Eq. (3.80) indeed leads for all jz to correlated power-law variables 6x;. 


The usual definition of the covariance is related to the average value of dx;5x,,. 


which can in some cases be divergent (i.e. when jz < 2), The idea is then to study 
directly the characteristic function of the product variable mj; = 5x;6x re The 


14 Other generalizations of the covariance have been proposed in the context of Lévy processes, such as the 
‘covariation’ {Samorodnitsky and Taqqu]. 


a _ B81). 
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justification is the following: if e, and e,, are two independent power-law variables. 
their product zt, is also a power-law variable (up to logarithmic corrections) with 
the same exponent yz (cf. Appendix C): 


2(A# A®) log(|s4 
P(t) = HAGA, ) log (asl) (a # b). (3.82) 


Tab I7tap|!*# 


On the contrary, the quantity z,, is distributed as a power-law with an exponent 
p./2 (cf. Appendix C): 
~ hat 

P (Zaa) eee Drtaal!* 4 . (3.83) 
Hence, the variable z,; gets both ‘non-diagonal’ contributions ,, and ‘diagonal’ 
OneS Taq. For 1; — 00, however, only the latter survive. Therefore z;; is a power- 
law variable of exponent j./2, with a tail amplitude that we will note ay > and 
an asymmetry coefficient £;; (see Section 1.3.3). Using the additivity of the tail 
amplitudes, and the results of Appendix C, one finds: 


M 
AY? = \°10iaO jal? Ae, (3.84) 
a=! 
and 
M 
CH? = Bi AL? = sign(OiaO ja)| Oia O jal? AX. (3.85) 
> a=] 


In the limit 4 = 2, one sees that the quantity ce reduces to the standard 
covariance, provided one identifies A? with the variance of the explicative factors 
D,. This suggests that ex is the suitable generalization of the covariance 
for power-law variables, which is constructed using extreme events only. The 
expression Eq. (3.85) furthermore shows that the matrix Oi, = sign(Oja)| Oia |"? 
allows one to diagonalize the ‘tail covariance matrix’ oe its eigenvalues are 
given by the A#’s. 

In summary, the tail covariance matrix ce is obtained by studying the 
asymptotic behaviour of the product variable 5x;5x;, which is a power-law variable 
of exponent j1/2. The product of its tail amplitude and of its asymmetry coefficient 
is our definition of the tail covariance oy. 


15 Note that O = O for p =2. 
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Optimal portfolio 


It is now possible to find the optimal portfolio that minimizes the loss probability. 
The fluctuations of the portfolio p can indeed be written as: 


M uM {M 
6S = >: Di9Xi => Fe ( Pi 0.) €q- (3.86) 
i=l i=l 


a=] 


Due to our assumption that the e, are symmetric power-law variables, 5S is also a 
symmetric power-law variable with a tail amplitude given by: 

M 

a=1 


M 
Ay = D|D PiOe 
i=] 


In the simple case where one tries to minimize the tail amplitude Af without any 
constraint on the average return, one then finds:'® 


“ 


Al. (3.87) 


, 


M 
>. Oia ARVs =~, (3.88) 
a=] H 


uw 


where the vector V,* = sino, OjaP5)| ee Ojap%|"~'. Once the tail covari- 
ance matrix C/* is known, the optimal weights p* are thus determined as follows: 


(i) The diagonalization of C“/? gives the rotation matrix O. and therefore one 

can construct the matrix O = sign(O)|O}*/" (understood for each element 
of the matrix), and the diagonal matrix A. 

(ii) The matrix OA is then inverted, and applied to the vector (7’/p)1. This 
gives the vector V* = t’/u(OA) "1. 

(iii) From the vector V* one can then obtain the weights p* by applying the 
matrix inverse of O* to the vector sign(V*)|V*|!/“>); 

(iv) The Lagrange multiplier ¢’ is then determined such as eae Prd 


Rather symbolically, one thus has: 


i 


, ~-\ 2-1 
p=(0')"! ({ (oay"'i) : - ¢ (3.8%). . 


i 
In the case js = 2, one recovers the previous prescription, since in that case: OA = OD = 
CO, (OA)! = O-!C~!, and OF = O7!, from which one gets: 


p= £ oo c = b1, (3.90) 
2 2 
which coincides with Eq. (3.74). 


16 Tf one wants to impose a non-zero average return, one should replace ¢'/u by ¢'/u + fm, /, where ¢ is 
determined such as to give to mp the required value. 
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The problem of the minimization of the value-at-risk in a strongly non-Gaussian 
world, where distributions behave asymptotically as power-laws. and in the pres- 
ence of correlations between tail events, can thus be solved using a procedure rather 
similar to the one proposed by Markowitz for Gaussian assets. 


3.4 Optimized trading (*) 
In this section, we will discuss a slightly different problem of portfolio optimiza- 
tion: how to dynamically optimize, as a function of time, the number of shares of a 
given stock that one holds in order to minimize the risk for a fixed level of return. 
We shall actually encounter a similar problem in the next chapter on options when 
the question of the optimal hedging strategy will be addressed. In fact, much of the 
notations and techniques of the present section are borrowed from Chapter 4. The 
optimized strategy found below shows that in order to minimize the variance, the 
time-dependent part of the optimal strategy consists in selling when the price goes 
up and buying when it goes down. However, this strategy increases the probability 
of very large losses! 
We will suppose that the trader holds a certain number of shares ¢,,(x,). where 
@ depends both on the (discrete) time ¢,, = nt, and on the price of the stock x,,. For 
the time being, we assume that the interest rates are negligible and that the change 
of wealth is given by: 
N-1 
AWx = D> bx (xx)8Xxx, (3.91) 
k=0 
where 5x, = X,4) — xz. (See Sections 4.1 and 4.2 for a more detailed discussion of 
Eq. (3.91).) Let us define the gain G = (A Wy) as the average final wealth, and the 
risk R? as the variance of the final wealth: 


R? = (AW}) -G?. (3.92) 


The question is then to find the optimal trading strategy $7 (x;), such that the risk 
R is minimized, for a given value of G. Introducing a Lagrange multiplier ¢. one 
thus looks for the (functional) solution of the following equation:!’ 


5 
5x (x) 
Now, we further assume that the price increments 5x, are independent random 


variables of mean mt and variance Dt. Introducing the notation P(x, k|xp, 0) for 
the price to be equal to x at time kt, knowing that it is equal to xo at time r = 0, 


[R? — £6] |,, 4, = 0. (3.93) 


17 The following equation results from a functional minimization of the risk. See Section 4.4.3 for further details 
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one has: 
N-t N-I 
G= Yu (re)) (Oxe) = mt a P(x. k|xo, Oe (x) dx, (3.94) 
k=0 k=0 


where the factorization of the average values holds because of the absence of 
correlations between the value of x, (and thus that of @,), and the value of the 
increment dx,. The calculation of (A W?) is somewhat more involved, in particular 
because averages of the type (dx¢@y(x,)), with k > £ do appear: in this case one 
cannot neglect the correlations between 6x, and x;y. Using the results of Appendix 
D, one finally finds: 


N-1 
(AWE) = Dr Yr | P(x, klxo, O)O2(x) dx (3.95) 
1 k~1 
+ mf fre £\x9, 0) P(x, kx’, &)be (x! Voe(x) < dxay’ 
=0 é=0 


Taking the derivative of P? — ¢G? with respect to ¢¢(x), one finds: 


2Dr P(x. k\xo. sil 9 —2(1 + f)mt P(x, kixo, hd 


+ mt P(x. k|xo, 0) 3 [Pe £\x, Kobe’) dx’ 


é=k+1 


k=1 ! 
, face Se, ! 
+mt Sf P(x', €|xo, OVP (x, klx’, Hobe(x’) rap 3.96) 
é=0 .; 

Setting this expression to zero gives an implicit equation for the optimal strategy 
¢;. A solution to this equation can be found, for m small, as a power series in m7. 
Looking for a reasonable return means that G should be of order m. Therefore we 
set: G = GonT, with T = Nt, and expand ¢* and 14 ¢ as: 


of =o +m. +--- and pa By Sy... (3.97) 


oe - 
Inserting these expressions in Eq. (3.96) leads, to zero-th order, to a time- 


independent strategy: 


_ SoGoT 
bo= 


The Lagrange multiplier f is then fixed by the equation: 


op (x) = 


(3.98) 


D 
G=GmT =mT do _ leading to & = 7 and ¢o = Go. (3.99) 
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To first order, the equation on - reads: 


Deol = eB Sf peer (3.100) 
t=k+1 


do vf P(x’, Exo, O) P(x. kx’. €)x—x' |, 
at pea Pe es dx’. 


P(x, k|xo, 0) k—€ 


The second term on the right-hand side is of order m, and thus negligible to 
this order. Interestingly, the last term can be evaluated explicitly without any 
assumption on the detailed shape for the probability distribution of the price 
increments. Using the method of Appendix D, one can show that for independent 
increments: 


A £ 
fo ~ Xo) P(x’, £\x0, 0) P(x, kx’, €) dx = 7 (x — Xo) P(x, klx0, 0). (3.101) 


Therefore, one finally finds: 


oj (x) = —— iat - Be — Xo). (3.102) 


This equation shows that in order to minimize the risk as measured by the variance, 
a trader should sell stocks when their price increases, and buy more stocks when 
their price decreases, proportionally to m(x — xg). The value of ¢; is fixed such 
that: 


N-1 
Do | Pee. ks. O48) dx =0, (3.103) 
k=0 


~ which leads to ¢; = 0 (plus order mm corrections). However, it can be shown that 


this strategy increases the VaR. For example, if the increments are Gaussian, the 
left tail of the distribution of A Wy (corresponding to large losses), using the above 
strategy, is found to be exponential, and therefore much broader than the Gaussian 
expected for a time-independent strategy: 


(3.104) 


4  2AW,| 
P.(AWx) ~awy+-20 exp( - 2"). 


G 


3.5 Conclusion of the chapter 


In a Gaussian world, all measures of risk are equivalent. Minimizing the variance or 
the probability of large losses (or the value-at-risk) lead to the same result, which 
is the family of optimal portfolios obtained by Markowitz. In the general case, 
however, the VaR minimization leads to portfolios which are different from those of 
Markowitz. These portfolios depend on the level of risk A (or on the time horizon 
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Les personnes non averties sont sujettes a se laisser induire en erreur’ 


(Lord Raglan, ‘Le tabou de l’inceste’, quoted by Boris Vian in L’automne a Pékin.) 


4.1 Introduction 
4.1.1 Aim of the chapter 


The aim of this chapter is to introduce the general theory of derivative pricing 
in a simple and intuitive, but rather unconventional, way. The usual presentation, 
which can be found in all the available books on the subject, relies on particular 
models where it is possible to construct riskless hedging strategies, which replicate 
exactly the corresponding derivative product.’ Since the risk is strictly zero, there 
is no ambiguity in the price of the derivative: it is equal to the cost of the 
hedging strategy. In the general case, however, these ‘perfect’ strategies do not 
exist, Not surprisingly for the layman, zero risk is the exception rather than the 
rule. Correspondingly, a suitable theory must include risk as an essential feature, 
which one would like to minimize. The present chapter thus aims at developing 
simple methods to obtain optimal strategies, residual risks, and prices of derivative 
products, which takes into account in an adequate way the peculiar statistical nature 
of financial markets, as described in Chapter 2. 


4.1.2 Trading strategies and efficient markets 
In the previous chapters, we have insisted on the fact that if the detailed prediction 
of future market moves is probably impossible, its statistical description is a 
reasonable and useful idea, at least as a first approximation. This approach only 
' Unwamed people may easily be fooled. 


2 . 
© See e.g. Hull, Wilmott, Baxter]. 
~ A hedging strategy is a wading strategy allowing one to reduce, and sometimes eliminate, the risk, 
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relies on a certain degree of stability (in time) in the way markets behave and 
the prices evolve." Let us thus assume that one can determine (using a statistical 
analysis of past time series) the probability density P(x, f|xo. fo), which gives the 
probability that the price of the asset X is equal to x (to within dx) at time 1, 
knowing that at a previous time fo, the price was equal to x9. As in previous 
chapters, we shall denote as (O) the average (over the ‘historical’ probability 
distribution) of a certain observable O: 


(O(x, t)) = f Pots, O)O(x, t) dx. (4.1) 


As we have shown in Chapter 2, the price fluctuations are somewhat correlated 
for small time intervals (a few minutes), but become rapidly uncorrelated (but not 
necessarily independent!) on longer time scales. In the following, we shall choose 
as our elementary time scale t an interval a few times larger than the correlation 
time —say t = 30 min on liquid markets. We shall thus assume that the correlations 


_ of price increments on two different intervals of size t are negligible.) When 


correlations are small, the information on future movements based on the study 
of past fluctuations is weak. In this case, no systematic trading strategy can be 
more profitable (on the long run) than holding the market index - of course, one 
can temporarily ‘beat’ the market through sheer luck, This property corresponds to 
the efficient market hypothesis.® 

It is interesting to translate this property into more formal terms. Let us suppose 
that at time ¢, = nt, an investor has a portfolio containing, in particular, a quantity 
¢n(X,) Of the asset X, quantity which can depend on the price of the asset x, = 
x(t,,) at time ¢, (this strategy could actually depend on the price of the asset for all 
previous times: (Xp. X,—1, Xn—2,--.)). Between ¢, and t,1, the price of X varies 
by 5x,,. This generates a profit (or a loss) for the investor equal to @p,(x,)5x,. Note 
that the change of wealth is not equal to 8(@x) = gdx + xd@, since the second 
term only corresponds to converting some stocks in cash, or vice versa, but not 
to a real change of wealth. The wealth difference between time f = O and time 


4 A weaker hypothesis is that the statistical ‘texture’ of the markets (ie. the shape of the probability 

distributions) is stable over time, but that the parameters which fix the amplitude of the fluctuations can 

be time dependent -—see Sections 1.7, 2.4 and 4.3.4, 

The presence of small correlations on larger time scales is however difficult to exclude, in particular on 

the scale of several years, which might reflect economic cycles for example. Note also that the ‘volatility’ 

fluctuations exhibit long-range correlations — cf, Section 2.4. 

© The existence of successful systematic hedge funds. which have consistently produced returns higher than the 
average for several years, suggests that some sort of hidden’ correlations do exist, at Jeast on certain markets. 
But if they exist these correlations must be smal] and therefore are not relevant for our main concerns: risk 
control and option pricing. 
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ty = T = Nr. due to the trading of asset X is. for zero interest rates. equal to: 
N-1 
AW, = a On (Xn OX - (4.2) 
n=0 
Since the trading strategy ¢,(x%,) can only be chosen before the price actually 
changes, the absence of correlations means that the average value of AWy (with 
respect to the historical probability distribution) reads: 


N-1 N-1 
(AWx) = Y (bn (%n))(8%n) = TY (Kuban) (4.3) 
n=0 n=0 


where we have introduced the average return m of the asset X, defined as: 


bXy = Mn Xn MT = (Mm). (4.4) 


The above equation (4.3) thus means that the average value of the profit is fixed 
by the average return of the asset, weighted by the level of investment across the 
considered period. We shall often use, in the following, an additive model (more 
adapted on short time scales, cf. Section 2.2.1) where Sx, is rather written as OX_ = 
nXo. Correspondingly, the average return over the time interval t reads: my, = 
(6x) = mrtxo. This approximation is usually justified as long as the total time 
interval T corresponds to a small (average) relative increase of price: mT < 1. We 
will often denote as m the average return per unit time: m = m,/T. 


Trading in the presence of temporal correlations 


It is interesting to investigate the case where correlations are not zero. For simplicity, we 
shall assume that the fluctuations 5x» are stationary Gaussian variables of zero mean 
(m, = 0). The correlation function is then given by (5xn5x~) = Crp. The C as ’s are 
the elements of the matrix inverse of C. If one knows the sequence of past increments 
bx9,...,45Xy—), the distribution of the next 8x, conditioned to such an observation is 
simply given by: 


Z roe 
P (5X) = .N exp ——" [5x_ — mal’. (4.5) 
aC ey ‘i 
ny, = ee os (4.6) 
Chn 


where N’ is a normalization factor, and m, the mean of 8x, conditioned to the past, which 
is non-zero precisely because some correlations are present.’ A simple strategy which 
exploits these correlations is to choose: 


bn(Xn, Xn—1,.--) = Sign(my), (4.7) 


* The notation m,, has already been used in Chapter | with a different meaning. Note also that a general formula 


exists for the distribution of 3x,., for all k > 0, and can be found in books on optimal filter theory, see 
references. , 
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which means that one buys (resp. sells) one stock if the expected next increment is positive 
(resp. negative). The average profit is then obviously given by (\my,\) > 0. We further 
assume thar the correlations are short ranged, such that only C et and C aa are non-zero. 
The unit time interval is then the above correlation time t. If this strategy is used during 


the time T = Nt, the average profit is given by: 
Se 
(AWy) = Na{[éx|) where a = |], (4.8) 


(a does not depend on n for a stationary process). With typical values, t = 30 min, 
(|6x|) ~ 1073 x9, and a of about 0.1 (cf. Section 2.2.2), the average profit would reach 
50% annual!® Hence, the presence of correlations (even rather weak) allows in principle 
one to make rather substantial gains. However, one should remember that some transaction 
costs are always present in some form or other (for example the bid-ask spread is a form 
of transaction cost). Since our assumption is that the correlation time is equal to t, it 
means that the frequency with which our trader has to ‘flip’ his strategy (i.e. @ — —q) 
is t~!. One should thus subtract from Eq. (4.8) a term on the order of —Nvxo, where v 
represents the fractional cost of a transaction. A very small value of the order of 10~* 
is thus enough to cancel completely the profit resulting from the observed correlations 
on the markets. Interestingly enough, the ‘basis point’ (10~*) is precisely the order of 
magnitude of the friction faced by traders with the most direct access to the markets. More 
generally, transaction costs allow the existence of correlations, and thus of ‘theoretical’ 
inefficiencies of the markets. The strength of the ‘allowed’ correlations on a certain time T 
is on the order of the transaction costs v divided by the volatility of the asset on the scale 
of T. 


At this stage, it would appear that devising a ‘theory’ for trading is useless: in the 
absence of correlations, speculating on stock markets is tantamount to playing the 
roulette (the zero playing the role of transaction costs!). In fact, as will be discussed 
in the present chapter, the existence of riskless assets such as bonds, which yield 
a known return, and the emergence of more complex financial products such as 
futures contracts or options, demand an adapted theory for pricing and hedging. 
This theory turns out to be predictive, even in the complete absence of temporal 
correlations. 


4.2 Futures and forwards 
4.2.1 Setting the stage 


Before turning to the rather complex case of options, we shall first focus on the 
very simple case of forward contracts, which allows us to define a certain number 
of notions (such as arbitrage and hedging) and notations. A forward contract F 
amounts to buying or selling today an asset X (the ‘underlying’) with a delivery 


8A Strategy that allows one to generate consistent abnormal returns with zero or minimal risk is called an 
‘arbitrage opportunity’. 
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date T = Nr in the future.” What is the price F of this contract, knowing that it 
must be paid at the date of expiry?!’ The naive answer that first comes to mind is 
a ‘fair game’ condition: the price must be adjusted such that, on average, the two 
parties involved in the contract fall even on the day of expiry. For example, taking 
the point of view of the writer of the contract (who sells the forward), the wealth 
balance associated with the forward reads: 


AWr =F —Xx(T). (4.9) 


This actually assumes that the writer has not considered the possibility of simul- 
taneously trading the underlying stock X to reduce his risk, which of course he 
should do, see below. 

Under this assumption, the fair game condition amounts to set (AWr) = 0, 
which gives for the forward price: 


Fg =(x(T)) = [xP T \xo, 0) dx, (4.10) 


if the price of X is xp at time t = 0. This price, that can we shall call the ‘Bachelier’ 
price, is not satisfactory since the seller takes the risk that the price at expiry x(T) 
ends up far above (x(7T)), which could prompt him to increase his price above 
Fr. 

Actually, the Bachelier price Fg is not related to the market price for a simple 
reason: the seller of the contract can suppress his risk completely if he buys now the 
underlying asset X at price xg and waits for the expiry date. However, this strategy 
is costly: the amount of cash frozen during that period does not yield the riskless 
interest rate. The cost of the strategy is thus x9e”?, where r is the interest rate per 
unit time. From the view point of the buyer, it would certainly be absurd to pay 
more than xge’’, which is the cost of borrowing the cash needed to pay the asset 
right away. The only viable price for the forward contract is thus F = xge’? + Fp, 
and is, in this particular case, completely unrelated to the potential moves of the 
asset X! 

An elementary argument thus allows one to know the price of the forward 
contract and to follow a perfect hedging strategy: buy a quantity ¢ = 1 of the 
underlying asset during the whole life of the contract. The aim of next paragraph is 
to establish this rather trivial result, which has not required any maths, in a much 
more sophisticated way. The importance of this procedure is that one needs to 


9 In practice ‘futures’ contracts are more common than forwards, While forwards are over-the-counter contracts, 
futures are traded on organized markets, For forwards there are typically no payments from either side before 
the expiration date whereas futures are marked-to-market and compensated every day, meaning that payments 
are made daily by one side or the other to bring the value of the contract back to zero. 

10 Note that if the contract was to be paid now, its price would be exactly equal to that of the underlying asset 
(barring the risk of delivery default). 
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learn how to write down a proper wealth balance in order to price more complex 
derivative products such as options, on which we shall focus in the next sections. 


4.2.2 Global financial balance 


Let us write a general financial balance which takes into account the trading 
strategy of the the underlying asset X. The difficulty lies in the fact that the amount 
¢nXn Which is invested in the asset X rather than in bonds is effectively costly: one 
‘misses’ the risk-free interest rate. Furthermore, this loss cumulates in time. It is 
not @ priori obvious how to write down the correct balance. Suppose that only two 
assets have to be considered: the risky asset X, and a bond B. The whole capital 
W,, at time t,, = nt is shared between these two assets: 


. 


Wr = bnXn + Br. (4.11) 


The time evolution of W, is due both to the change of price of the asset X, and to 
the fact that the bond yields a known return through the interest rate r: 


Watt — Wa = bn(Xn+1 — Xn) + Bap. p=rt. (4.12) 


On the other hand, the amount of capital in bonds evolves both mechanically, 
through the effect of the interest rate (+B,,0). but also because the quantity of 
stock does change in time (¢, — @n41), and causes a flow of money from bonds 
to stock or vice versa. Hence: 


: Bn+i Te By, = Bip = Xn+i(On41 = dn). (4.13) 


Note that Eq. (4.11) is obviously compatible with the following two equations. The 


solution of the second equation reads: 


By = (1 + p)"Bo — D> xx(bi — Ges) + py". (4.14) 


_ k=l 


Plugging this result in Eq. (4.11), and renaming k — | — k in the second part of 


the sum, one finally ends up with the following central result for W,: 


n=} 
Wr = Wo(1 + p)" +) We (ati — Xe — px). (4.15) 
k=0 


with Wf = (1 + p)"*~'. This last expression has an intuitive meaning: the 
gain or loss incurred between time k and k + 1 must include the cost of the 
hedging strategy —px;; furthermore, this gain or loss must be forwarded up to 
time n through interest rate effects, hence the extra factor (1 + p)"~*~!. Another 
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useful way to write and understand Eq. (4.15) is to introduce the discounted prices 
%% =x, (1 + p)~*. One then has: 


ul 
W, = (1 +p)” ( + Yo bu (Fes = i) : (4.16) 
k=0 
The effect of interest rates can be thought of as an erosion of the value of the money 
itself. The discounted prices x; are therefore the ‘true’ prices, and the difference 
Xz4) —X, the ‘true’ change of wealth. The overall factor (1 + p)” then converts this 
true wealth into the current value of the money. 

The global balance associated with the forward contract contains two further 
terms; the price of the forward F that one wishes to determine, and the price of the 
underlying asset at the delivery date. Hence, finally: 


N-1 
Wy =F -xv+(1+p)% (Wo + Sect -»), (4.17) 
k=0 


Since by identity ¥y = pe (X41 — X4) + xo, this last expression can also be 
written as: 


N-1 
Wy =F+(1+p)% (w —x0+ Yo ~ 1) G41 - io) ; (4.18) 
k=0 


4.2.3 Riskless hedge 


In this last formula, all the randomness, the uncertainty on the future evolution of 
the prices, only appears in the last term. But if one chooses ¢, to be identically 
equal to one, then the global balance is completely independent of the evolution of 
the stock price. The final result is not random, and reads: 


Wy =F+(1+)"(Wo — 20). (4.19) 


Now, the wealth of the writer of the forward contract at time 7 = Nt cannot be, 


with certitude, greater (or less) than the one he would have had if the contracthad 


not been signed, ic. Wo(1 + p)%. If this was the case, one of the two parties would 
be losing money in a totally predictable fashion (since Eq. (4.19) does not contain 
any unknown term). Since any reasonable participant is reluctant to give away his 


money with no hope of return, one concludes that the forward price is indeed given 
by: 


F =xo(1+p)% ~ xe” £ Fr, (4.20) 


which does not rely on any statistical information on the price of X! 
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Dividends 


In the case of a stock that pays a constant dividend rate 8 = dt per interval of time t, the 
global wealth balance is obviously changed into: 


N-1 
Wy = F —xy + Wo(l + 0)" + D> oh Cress — xe + 6 - pdx). (4.21) 
k=0 


It is easy to redo the above calculations in this case. One finds that the riskless strategy is 
now to hold: 


(1 +p —8)N-* 1 
= ——_,-, — 4.22) 
ee PS 
stocks. The wealth balance breaks even if the price of the forward is set to: 
F =x(1 +p —d)% x xpe%-97, (4.23) 


which again could have been obtained using a simple no arbitrage argument of the type 
presented below. 


Variable interest rates 


In reality, the interest rate is not constant in time but rather also varies randomly. More 
precisely, as explained in Section 2.6, at any instant of time the whole interest rate curve 
for different maturities is known, but evolves with time. The generalization of the global 
balance, as given by formula, Eq. (4.17), depends on the maturity of the bonds included 
in the portfolio, and thus on the whole interest rate curve. Assuming that only short-term 
bonds are included, yielding the (time-dependent) ‘spot’ rate px, one has: 


N-1 
“Wyn = F-xv+Wo []a+e 
k=0 
N-1 
+ > Wh (ree — te — peek), (4.24) 
k=0 


with: vy = od Iteakl + pe). It is again quite obvious that holding a quantity dy = | 
of the underlying asset leads to zero risk in the sense that the fluctuations of X disappear. 
However, the above strategy is not immune to interest rate risk. The interesting complexity 
of interest rate problems (such as the pricing and hedging of interest rate derivatives) 
comes from the fact that one may choose bonds of arbitrary maturity to construct the 
hedging strategy. In the present case, one may take a bond of maturity equal to that of the 
forward. In this case, risk disappears entirely, and the price of the forward reads: 


Pe 
~ BON)’ 


(4.25) 


where B(O, N) stands for the value, at time 0, of the bond maturing at time N. 
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4.2.4 Conclusion: global balance and arbitrage 


From the simple example of forward contracts, one should bear in mind the 
following points, which are the key concepts underlying the derivative pricing 
theory as presented in this book. After writing down the complete financial balance, 
taking into account the trading of all assets used to cover the risk, it is quite 
natural (at least from the view point of the writer of the contract) to determine 
the trading strategy for all the hedging assets so as to minimize the risk associated 
to the contract. After doing so, a reference price is obtained by demanding that 
the global balance is zero on average, corresponding to a fair price from the point 
of view of both parties. In certain cases (such as the forward contracts described 
above), the minimum risk is zero and the true market price cannot differ from 
the fair price, or else arbitrage would be possible. On the example of forward 
contracts, the price, Eq. (4.20), indeed corresponds to the absence of arbitrage 
opportunities (AAO), that is, of riskless profit. Suppose for example that the price 
of the forward is below F = xo(1 + p)%. One can then sell the underlying asset 
now at price xg and simultaneously buy the forward at a price F’ < F, that 
must be paid for on the delivery date. The cash xo is used to buy bonds with 
a yield rate p. On the expiry date, the forward contract is used to buy back the 
stock and close the position. The wealth of the trader is then xg(1 + p)* — F’, 
which is positive under our assumption —and furthermore fully determined at time 
zero: there is profit, but no risk. Similarly, a price F’ > F would also lead to 
an opportunity of arbitrage. More generally, if the hedging strategy is perfect, this 
means that the final wealth is known in advance. Thus, increasing the price as 
compared to the fair price leads to a riskless profit for the seller of the contract, 
and vice versa. This AAO principle is at the heart of most derivative pricing 
theories currently used, Unfortunately, this principle cannot be used in the general 
case. where the minimal risk is non-zero, or when transaction costs are present 
(and absorb the potential profit, see the discussion in Section 4.1.2 above). When 
the risk is non-zero, there exists a fundamental ambiguity in the price, since 
one should expect that a risk premium is added to the fair price (for example, 
as a bid-ask spread). This risk premium depends both on the risk-averseness 
of the market maker, but also on the liquidity of the derivative market: 7if the 
price asked by one market maker is too high, less greedy market makers will 
make the deal. This mechanism does however not operate for ‘over-the-counter’ 


operations (OTC, that is between two individual parties, as opposed to through an _ 


organized market). We shall come back to this important discussion in Section 4.6 
below. 


Let us emphasize that the proper accounting of all financial elements in the 
wealth balance is crucial to obtain the correct fair price. For example, we have seen 
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above that if one forgets the term corresponding to the trading of the underlying 
stock, one ends up with the intuitive, but wrong. Bachelier price, Eq. (4.10). 


4.3 Options: definition and valuation 
4.3.1 Setting the stage 


A buy option (or ‘call’ option) is nothing but an insurance policy, protecting the 
owner against the potential increase of the price of a given asset, which he will 
need to buy in the future. The call option provides to its owner the certainty of not 
paying the asset more than a certain price. Symmetrically, a ‘put’ option protects 
against drawdowns, by insuring to the owner a minimum value for his stock. 
More precisely, in the case of a so-called ‘European’ option,'' the contract is 
such that at a given date in the future (the ‘expiry date’ or ‘maturity’) ¢ = 7, the 
owner of the option will not pay the asset more than x, (the ‘exercise price’, or 
‘strike’ price): the possible difference between the market price at time 7, x(T) 
and x, is taken care of by the writer of the option. Knowing that the price of the 


underlying asset is x9 now (i.e. at tf = 0), what is the price (or ‘premium’) C of 


the call? What is the optimal hedging strategy that the writer of the option should 
follow in order to minimize his risk? 

The very first scientific theory of option pricing dates back to Bachelier in 
1900. His proposal was, following a fair game argument, that the option price 
should equal the average value of the pay-off of the contract at expiry. Bachelier 
calculated this average by assuming the price increments 5x, to be independent 
Gaussian random variables, which leads to the formula, Eq. (4.43) below. However, 


-Bachelier did not discuss the possibility of hedging, and therefore did not include 


in his wealth balance “the term corresponding to the trading strategy that we 
have discussed in the previous section. As we have seen, this is precisely the 
term responsible for the difference between the forward ‘Bachelier price’ Fg (cf. 
Eq. (4.10)) and the true price, Eq. (4.20). The problem of the optimal trading 
strategy must thus, in, principle, be solved before one can fix the price of the 
option.'? This is the problem that was solved by Black and Scholes in 1973, when 


_they showed that for a continuous-time Gaussian process, there exists a perfect 


strategy, in the sense that the risk associated to writing an option is strictly zero. as 

is the case for forward contracts. The determination of this perfect hedging strategy 

allows one to fix completely the price of the option using an AAO argument. 

Unfortunately, as repeatedly discussed below. this ideal strategy only exists in 

'! Many other types of options exist: ‘American’, ‘Asian’, ‘Lookback’, ‘Digitals’, etc., see [Wilmott], We will 
discuss some of them in Chapter 5. 


'? In practice, however, the influence of the trading strategy on the price (but not on the risk!) is quite small, see 
below. 
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a continuous-time, Gaussian world,'* that is. if the market correlation time was 
infinitely short, and if no discontinuities in the market price were allowed — both 
assumptions rather remote from reality. The hedging strategy cannot. in general, be 
perfect. However, an optimal hedging strategy can always be found, for which the 
risk is minimal (cf. Section 4.4). This optimal strategy thus allows one to calculate 
the fair price of the option and the associated residual risk. One should nevertheless 
bear in mind that, as emphasized in Sections 4.2.4 and 4.6, there is no such thing 
as a unique option price whenever the risk is non-zero. 

Let us now discuss these ideas more precisely. Following the method and 
notations introduced in the previous section, we write the global wealth balance 
for the writer of an option between time t = 0 and t = T as: 


Wy = [Wo +C](1 +p) — max(xy — x5, 0) 
N-1 
+ 0 Wl ee — xk — pre), (4.26) 
k=0 


which reflects the fact that: 


e The premium C is paid immediately (i.e. at time r = 0). 

e The writer of the option incurs a loss xy — x, only if the option is exercised 
(xn > Xs). 

e The hedging strategy requires that the writer convert a certain amount of bonds 
into the underlying asset, as was discussed before Eq. (4.17). 


A crucial difference with forward contracts comes from the non-linear nature of 
the pay-off, which is equal, in the case of a European option, to V(xy) = max(xy— 
x. 0). This must be contrasted with the forward pay-off, which is linear (and equal 
to xy). It is ultimately the non-linearity of the pay-off which, combined with the 
non-Gaussian nature of the fluctuations, leads to a non-zero residual risk. 

An equation for the call price C is obtained by requiring that the excess return 
due to writing the option, AW = Wy — Wo(1 + »)%, is zero on average: 


N-1 
(1+ pyXC = | (max(xy — xs.0)) — SOW rar — ae — aed) |. (4.27) 
k=0 ep 

This price therefore depends, in principle, on the strategy yj’ = "(1 + p)%-*!. 

This price corresponds to the fair price, to which a risk premium will in general be 
added (for example in the form of a bid-ask spread). 

In the rather common case where the underlying asset is not a stock but a forward 

13 A property also shared by a discrete binomial evolution of prices. where at each time step, the price increment 


6x can only take two values, see Appendix E. However, the risk is non-zero as soon as the number of possible 
price changes exceeds two. 
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on the stock, the hedging strategy is less costly since only a small fraction f of the 
value of the stock is required as a deposit. In the case where f = 0, the wealth 
balance appears to take a simpler form. since the interest rate is not lost while 
trading the underlying forward contract: 
N-1 
C=(1+p) | (max(Fy — 3,0) — \(b(Fii—Fad) |. (4.28) 
k=0 
However, one can check that if one expresses the price of the forward in terms of the 
underlying stock, Eqs (4.27) and (4.28) are actually identical. (Note in particular 
that Fy = xw.) 


> 4.3.2 Orders of magnitude 


Let us suppose that the maturity of the option is such that non-Gaussian ‘tail’ 
effects can be neglected (T > T*, cf. Sections 1.6.3, 1.6.5, 2.3), so that the 
distribution of the terminal price x(7) can be approximated by a Gaussian of mean 
mT and variance DT = oer If the average trend is small compared to the 
RMS VDT, a direct calculation for ‘at-the-money’ options gives:!® 


OY Se aE (x —xp —mT)* 
{max(x(T) — x,,0)) = JinDT (3 a 


D T m'T3 
DE mt | of Im 


a aT D 


(4.29) 


Taking T = 100 days, a daily volatility of o = 1%, an annual return of m = 5%, 
and a stock such that x9 = 100 points, one gets: 
| DT if 
\ — ~ 4 points a ~ 0.67 points. (4.30) 
20 2 
In fact, the effect of a non-zero average return of the stock is much less than 


the aboye estimation would suggest. The reason is that one must also take into 
account the last term of the balance equation, Eq. (4.27), which comes from the 


_hedging strategy. This term corrects the price by an amount —(¢)mT. Now, as 


we shall see later, the optimal strategy for an at-the-money option is to hold, on 

average (¢) = 5 stocks per option. Hence, this term precisely compensates the 

increase (equal to m7 /2) of the average pay-off, (max(x(T) — x;, 0)). This strange 

compensation is actually exact in the Black-Scholes model (where the price of 

'4 We again neglect the difference between Gaussian and log-normal statistics in the following order of 
magnitude estimate. See Sections 1.3.2, 2.2.1, Eq. (4.43) and Figure 4.1 below. 


IS An option is called ‘at-the-money’ if the strike price is equal to the current price of the underlying stock 
(X53 = X9), ‘out-of-the-money" if.x; > xq and ‘in-the-money” if x; < xg. 
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the option turns out to be completely independent of the value of m) but only 
approximate in more general cases. However, It is a very good first approximation 
to neglect the dependence of the option price in the average return of the stock: we 
shall come back to this point in detail in Section 4.5. 

The interest rate appears in two different places in the balance equation, 
Eq. (4.27): in front of the call premium C, and in the cost of the trading strategy. 
For p = rt corresponding to 5% per year, the factor (1 + p)” corrects the option 
price by a very small amount, which, for T = 100 days, is equal to 0.06 points. 
However, the cost of the trading strategy is not negligible. Its order of magnitude 
is ~ (@)xorT: in the above numerical example, it corresponds to a price increase 
of 2 of a point (16% of the option price), see Section 5.1. 


4.3.3 Quantitative analysis ~ option price 


We shall assume in the following that the price increments can be written as: 
Xee1 — Xk = Xe + OXe, (4.31) 


where 5x; is a random variable having the characteristics discussed in Chapter 2, 
and a mean value equal to (6x,) = m1 = mt. The above order of magnitude 
estimate suggests that the influence of a non-zero value of m leads to small 
corrections in comparison with the potential fluctuations of the stock price. We 
shall thus temporarily set m = 0 to simplify the following discussion, and come 
back to the corrections brought about by a non-zero value of m in Section 4.5. 

Since the hedging strategy Wx is obviously determined before the next random 
change of price dx;, these two quantities are uncorrelated, and one has: 


(Wdx4) = (Wa) (Sxx) =O (m =O). (4.32) 


In this case, the hedging strategy disappears from the price of the option, which is 
given by the following ‘Bachelier’-like formula, generalized to the case where the 
increments are not necessarily Gaussian: 


C = (1+) %(max(xy — xs. 0)) 


G+)" [ (x — x.) P(x. N|xo. 0) dx. (4.33) 


In order to use Eq. (4.33) concretely, one needs to specify a model for price 
increments. In order to recover the classical model of Black and Scholes, let us 
first assume that the relative returns are iid random variables and write 6x, = 
nexg, With ne < 1. If one knows (from empirical observation) the distribution 
P;(n,) of returns over the elementary time scale tT. one can easily reconstruct (using 
the independence of the returns) the distribution P(x, N|xo, 0) needed to compute 
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C. After changing variables to. x — xo(l + p)*e', the formula, Eq. (4.33). Is 


transformed into: 


c= x0 f (e° — e) Py(y) dy, (4.34) 
Ys 


where y, = log(x;/xo[1 + py") and Py(y) = P(y, N{0, 0). Note that y, involves 
the ratio of the strike price to the forward price of the underlying asset, xo[1 + pl’. 
Setting x, = Xo(1 + p)*e*, the evolution of the y,’s is given by: 


2 

Nk ne = 
Oe ar eee 14 = 0; 4.35 
ert — Ye =F ~ 2 Yo ( ) 


where third-order terms (n°, n2p, -.-) have been neglected. The distribution of the 
quantity yy = pia (Om /( + p)) - (7? /2)) is then obtained, in Fourier space. 
as: 


Py(z) = (Pilz). (4.36) 


where we have defined, in the right-hand side, a modified Fourier transform: 


Ay = f Pienerr|ie( _ r)| dy. (4.37) 
l+p 2 


The Black and Scholes limit 


We can now examine the Black-Scholes limit, where P;(1) is a Gaussian of zero 
mean and RMS equal to 0; = o/T. Using the above Eqs (4.36) and (4.37) one 
finds, for N large:!°~ 


: 2 2 2 
ee iF (4.38) 


| 
Py(y) = = exp ( 3 
,/2a Nop 2Noy 


The Black-Scholes model also corresponds to the limit where the elementary time 
interval t goes to zero, in which case one of course has N = T/t — oo. AS 
discussed in Chapter 2, this limit is not very realistic since any transaction takes at 
least a few seconds and, more significantly, that some correlations persist over at 
least several minutes. Notwithstanding, if one takes the mathematical limit where 
1 > 0, with N = T/t > © but keeping the product No; = To? finite, 
one constructs the continuous-time log-normal process, or ‘geometrical Brownian 
motion’. In this limit, using the above form for Py (y) and (1 + p)* > e T one 


16 fp fact, the variance of Py (¥) is equal to No? /A + p)*, but we neglect this small correction which disappears 
in the limit tT > 0. 
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obtains the celebrated Black-Scholes formula: 
~(e —e) ( (v + or) d 
x ——— exp | - ———-——— J dy 
yx, V2r02T : y 20°T : 


vy és ¥+ 
XiPos { — — xe 1p if } (4.39 
itis (=) SAE 


where ys = log(x,/xo) — rT + 07T/2 and Po.(u) is the cumulative normal 
distribution defined by Eq. (1.68). The way Cgs varies with the four parameters 
Xp, Xs. T and @ is quite intuitive, and discussed at length in all the books which 
deal with options. The derivatives of the price with respect to these parameters are 
now called the ‘Greeks’, because they are denoted by Greek letters. For example, 
the larger the price of the stock xo, the more probable it is to reach the strike price 
at expiry, and the more expensive the option. Hence the so-called ‘Delta’ of the 
option (A = dC /dx9) is positive. As we shall show below, this quantity is actually 
directly related to the optimal hedging strategy. The variation of A with xo is noted 
I (Gamma) and is defined as: /! = 0A/0xo. Similarly, the dependence in maturity 
T and volatility o (which in fact only appear through the combination o VT if rT is 
small) leads to the definition of two further ‘Greeks’: @ = —dC/dT = dC/dr < 0 
and ‘Vega’ V = dC/do > 0, the higher o JT, the higher the call premium. 


Ces(xo.%s.T) = 


ll 


Bachelier’s Gaussian limit 


Suppose now that the price process is additive rather than multiplicative. This 
means that the increments dx, themselves, rather than the relative increments, 
should be considered as independent random variables. Using the change of 
variable x, = (1 + p)*x, in Eq. (4.31), one finds that x, can be written as: 


N-1 
xn =xo(1 + py” + >: 8xe(1 +p)". (4.40) 
k=0 


When N is large, the difference xy — xo(1 +p)" becomes, according to the CLT, 
a Gaussian variable of mean zero (if m = 0) and of variance equal to: 


N-1 
eT) = Dt 0 (1+ p)* = DT [1 + p(N — 1) + O(?N”)] (4.41) 
£=0 2 


where Dz is the variance of the individual increments, related to the volatility 


through: D = o*x2. The price, Eq. (4.33), then takes the following form: 
1 ( jh ( (x — xe? y 
Xe pS SE 
J/2rex(T) ines 2c2(T) 
The price formula written down by Bachelier corresponds to the limit of short 
maturity options, where all interest rate effects can be neglected. In this limit, the 


oc 
Cg(xo. Xs, T) = eof 


) ax (4.42) - 
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above equation simplifies to: 


Ce(xo. x,.T) = / (x — Xs) i exp ( mre Jax. (4.43) 
It 


Xs Z 


This equation can also be derived directly from the Black-Scholes price, Eq. (4.39), 
in the small maturity limit, where relative price variations are small: xy /x9 — 1 « 
1, allowing one to write y = log(x/xo) = (x —x0)/xo. As emphasized in 
Section 1.3.2, this is the limit where the Gaussian and log-normal distributions 
become very similar. 

The dependence of Cg(xo, x;. 7) as a function of x, is shown in Figure 4.1, 
where the numerical value of the relative difference between the Black-Scholes 
and Bachelier price ts also plotted. 

In a more general additive model, when N is finite, one can reconstruct the 
full distfibution P(x, N|xo,0) from the elementary distribution P(x) using 
the convolution rule (slightly modified to take into account the extra factors 
(1 + p)%-*-! in Eq. (4.40)). When N is large, the Gaussian approximation 
P(x, N|xo, 0) = Po(x, N|xo(1 +e), 0) becomes accurate, according to the CLT 


- discussed in Chapter 1. 


As far as interest rate effects are concerned, it is interesting to note that both 
formulae, Eqs (4.39) and (4.42), can be written as e~’? times a function of xge"’, 
that is of the forward price Fo. This is quite natural, since an option on a stock and 
on a forward must have the same price, since they have the same pay-off at expiry. 
As will be discussed in Chapter 5, this is a rather general property, not related to 
any particular model for price fluctuations. 


~ Dynamic equation for the option price 


It is easy to show directly-that the Gaussian distribution: 


. l (x — x9)? 
Pox, T\x9, 0) = ex , 4.44 
JinDt ( 2DT (4:44) 
obeys the diffusion equation (or heat equation): 
8Pa(x, T|xo.0) D8? Pg (x. T|xo. 0) 
— Se ee SSS (4.45) 
oT 2 ax- 
with boundary conditions: 
P(x, Ojxg, 0) = 5(x — x9). (4.46) 


On the other hand, since Pg(x. T\xo, 0) only depends on the difference x — xp, one has: 


a°Po(x.T|xo,0) 9? Pe(x, T 1x0. 0) 


oe an (4.47) 
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80 90 ~ 100 110 120 
Xe 
Fig. 4,1. Price of a call of maturity T = 100 days as a function of the strike price x,. in 
the Bachelier model. The underlying stock is worth 100, and the daily volatility is 1%. 
The interest rate is taken to be zero. Inset: relative difference between the the log-normal, 


Black-Scholes price Cps and the Gaussian. Bachelier price (Cg), for the same values of the 
parameters. 


Taking the derivative of Eq. (4.43) with respect to the maturity T, one finds 


8CG(xo.%5,T) _ D A°Co(x0, Xs: T) 
aT a axe : 


with boundary conditions, for a zero maturity option: 


Co (xo. Xs, 0) = max(xg — x,. 0). (4.49) 


(4.48) 
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The option price thus also satisfies the diffusion equation, We shall come back to this point 
later, in Section 4.5.2; it is indeed essentially this equation that Black and Scholes derived 
using stochastic calculus in 1973. 


4.3.4 Real option prices, volatility smile and ‘implied’ kurtosis 
Stationary distributions and the smile curve 


We shall now come back to the case where the distribution of the price increments 
6x, is arbitrary, for example a TLD. For simplicity, we set the interest rate p to 
zero. (A non-zero interest rate can readily be taken into account by discounting 
the price of the call on the forward contract.) The price difference is thus the 
sum of N = T/r iid random variables, to which the discussion of the CLT 
presented in Chapter | applies directly. For large N, the terminal price distribution 
P(x, N|xo, 0) becomes Gaussian, while for N finite, ‘fat tail’ effects are important 
and the deviations from the CLT are noticeable. In particular, short maturity options 
or out-of-the-money options, or options on very ‘jagged’ assets, are not adequately 
priced by the Black-Scholes formula. 

In practice, the market corrects for this difference empirically, by introducing 
in the Black-Scholes formula an ad hoc ‘implied’ volatility ©, different from 
the ‘true’, historical volatility of the underlying asset. Furthermore, the value of 
the implied volatility needed to price options of different strike prices x, and/or 
maturities T properly is not constant, but rather depends both on T and x,. This 
defines a ‘volatility surface’ D(x,, 7). It is usually observed that the larger the 
difference between x, and xo, the larger the implied volatility: this is the so-called 


_ ‘smile effect’ (Fig.-4.2). On the other hand, the longer the maturity 7, the better 


the Gaussian approximation; the smile thus tends to flatten out with maturity. 

It is important to understand how traders on option markets use the Black— 
Scholes formula. Rather than viewing it as a predictive theory for option prices 
given an observed (historical) volatility, the Black-Scholes formula allows the 
trader to translate back and forth between market prices (driven by supply and 
demand) and an abstract parameter called the implied volatility. In itself this 
transformation (between prices and volatilities) does not add any new information. 


~ However, this approach is useful in practice, since the implied volatility varies 


much less than option prices as a function of maturity and moneyness. The 
Black-Scholes formula is thus viewed as a zero-th order approximation that 
takes into account the gross features of option prices, the traders then correcting 
for other effects by adjusting the implied volatility. This view is quite common 
in experimental sciences where an a priori constant parameter of an approxi- 
mate theory is made into a varying effective parameter to describe more subtle 
effects. 
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Fig. 4.2. ‘Implied volatility’ 2 as used by the market to account for non-Gaussian effects. 
The larger the difference between x; and xg, the larger the value of X: the curve has 
the shape of a smile. The function plotted here is a simple parabola, which is obtained 
theoretically by including the effect of a non-zero kurtosis «, of the elementary price 
increments. Note that for at-the-money options (x, = xq), the implied volatility is smaller 
than the true volatility. 


However, the use of the Black-Scholes formula (which assumes a constant 
volatility) with an ad hoc variable volatility is not very satisfactory from a 
theoretical point of view. Furthermore, this practice requires the manipulation 
of a whole volatility surface Y(x,, 7), which can deform in time—this is not 
particularly convenient. 

A simple calculation allows one to understand the origin and the shape of the 
volatility smile. Let us assume that the maturity T is sufficiently large so that only 
the kurtosis «, of P;(éx) must be taken into account to measure the difference 
with a Gaussian distribution.'’ Using the results of Section 1.6.3, the formula, 
Eq. (4.34), leads to: 


kit | DT (5 — x0)? | (Gs — x0)? 
AC,(x0, x5, T — —~——— | (———— - 1}. (4.50 
Rit rho, 2DT DT ew) 
where D = 07x? and AC, = Cy — Cyao. 7 
One can indeed transform the formula 
ce 
C= J (x’ — xs) P(x’, T]xo. 0) dx’, (4.51) 


'7 We also assume here that the distribution is symmetrical (P| (dx) = P)(~4x}}, which is usually justified 
on short time scales. If this assumption is not adequate, one must include the skewness 43, leading to an 


asymmetrical smile, cf. Eq. (4.56). 
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through an integration by parts 


oO 
C= P(x’. T\x9, 0) dx’. (4,52) 


vs 


After changing variables x' —» x' — xo, and using Eq. (1.69) of Section 1.6.3, one gets: 


VDT ¢™ 1 ; 
oa Cot f —= 0; (uje" 
TE iy: “Ja . 


Foam? du +-.- (4.53) 


where us = (x, — x9)// DT. Now, noticing that: 


aie a 
- Qiu? = Been, (4.54) 
and 
nit it Pte 2 PP 
nye" {2 ie Pee fee 2D —u* {2 
Q2(u) Face 79 a3* ‘ (4.55) 
the integrations over u are readily performed, yielding 
2 
ews /2 Aa ha 
C = CotVdDT (“au Fi lyr it 
lr 6/N s 24N (us ) 
rt 
+ aa (ul — 6u2 + 3)+- +). (4.56) 


which indeed coincides with Eq. (4.50) for 23 = 0. In general, one has te < Ag; in this 
case, the smile remains a parabola, but shifted and centred around xo(1 — 40 TA3/A4). 


Note that we refer here to additive (rather than multiplicative) price increments: 
the Black-Scholes volatility smile is often asymmetrical merely because of the use 
of a skewed log-normal distribution to describe a nearly symmetrical process. 

On the other hand, a simple calculation shows that the variation of 
C,=0(%0, Xs, T) [as given by Eq. (4.43)] when the volatility changes by a small 
quantity 5D = 20xj8o is given by: 


-: (x, — xo)? 
dC, =0(X9, Xs, T) = b4X0\/ — —__— |. 
0(X9, Xs, T) = ba Xo be exp |- ODT (4.57) 


The effect of a non-zero kurtosis «; can thus be reproduced (to first order) by 
a Gaussian pricing formula, but at the expense of using an effective volatility 
2X (x;,T) =o + do given by: 


y K(7 ) f Gs - xo)? 
Xs, 7 =0d0 1 + _ OOO 4 58 
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Fig. 4.3. Comparison between theoretical and observed option prices. Each point corre- 
sponds to an option on the Bund (traded on the LIFFE in London), with different strike 
prices and maturities. The x coordinate of each point is the theoretical price of the option 
(determined using the empirical terminal price distribution), while the y coordinate is the 
market price of the option. If the theory is good, one should observe a cloud of points 
concentrated around the line y = x. The dotted line corresponds to a linear regression, and 
gives v = 0.998x + 0.02 (in basis points units). 


with «(T) = «,/N. This very simple formula, represented in Figure 4.2, allows one 
to understand intuitively the shape and amplitude of the smile. For example, for a 
daily kurtosis of «, = 10, the volatility correction is on the order of da /o ~ 17% 
for out-of-the-money options such that x, — x9 = 3/DT, and for T = 20 days. 
Note that the effect of the kurtosis is to reduce the implied at-the-money volatility 
in comparison with its true value, 

Figure 4.3 shows some ‘experimental’ data, concerning options on the Bund 
(futures) contract-for which a weakly non-Gaussian model is satisfactory (cf. 
Section 2.3). The associated option market is, furthermore, very liquid; this tends 
to reduce the width of the bid-ask spread, and thus, probably to decrease the 
difference between the market price and a theoretical ‘fair’ price. This is not the 


case for OTC options, when the overhead charged by the financial institufion - 


writing the option can in some case be very large, cf. Section 4.4.1 below. In 
Figure 4.3, each point corresponds to an option of a certain maturity and strike 
price. The coordinates of these points are the theoretical price, Eq. (4.34), along the 


x axis (calculated using the historical distribution of the Bund), and the observed 


market price along the y axis. If the theory is good, one should observe a cloud 
of points concentrated around the line y = x. Figure 4.3 includes points from the 
first half of 1995, which was rather ‘calm’, in the sense that the volatility remained 
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roughly constant during that period (see Fig. 4.4). Correspondingly, the assumption 
that the process Is stationary is reasonable. 

The agreement between theoretical and observed prices is however much less 
convincing if one uses data from 1994, when the volatility of interest rate markets 
has been high. The following subsection aims at describing these discrepancies in 
greater details. 


Non-stationarity and ‘implied’ kurtosis 
A more precise analysis reveals that the scale of the fluctuations of the underlying 
asset (here the Bund contract) itself varies noticeably around its mean value; these 
‘scale fluctuations’ are furthermore correlated with the option prices. More precisely, one 
postulates that the distribution of price increments 5x has a constant shape, but a width yx 
which is time dependent (cf. Section 2.4), i.e.:'8 


, 


1 dx, 
P, (xx) = — Pio (=), (4.59) 
Yk Yk 


where Pig is a distribution independent of k, and of width (for example measured by the 
MAD) normalized to one. The absolute volatility of the asset is thus proportional to yx. 
Note that if Py is Gaussian and time is continuous, this model is known as the stochastic 


. volatility Brownian motion.'? However, this assumption is not needed, and one can keep 


Py arbitrary. 

Figure 4.4 shows the evolution of the scale y (filtered over 5 days in the past) as a 
function of time and compares it to the implied volatility X(x, = xo) extracted from 
at-the-money options during the same period. One thus observes that the option prices 
are rather well tracked by adjusting the factor yx through a short-term estimate of the 
volatility of the underlying asset. This means that the option prices primarily reflects the 
quasi-instantaneous volatility of the underlying asset itself, rather than an ‘anticipated’ 
average volatility on the life time of the option. 

It is interesting to notice that the mere existence of volatility fluctuations leads to a non- 


- zero kurtosis ky of the asset fluctuations (see Section 2.4). This kurtosis has an anomalous 


time dependence (cf. Eq. ¢2.17)), in agreement with the direct observation of the historical 
kurtosis (Fig. 4.6). On the other hand, an ‘implied kurtosis’ can be extracted from the 
market price of options, using Eq. (4.58) as a fit to the empirical smile, see Figure 4.5, 
Remarkably enough, this implied kurtosis is in close correspondence with the historical 
kurtosis —note that Figure 4.6 does not require any further adjustable parameter: 

As a conclusion of this section, it seems that market operators have empirically corrected 
the Black-Scholes formula to account for two distinct, but related effects: 


° The presence of market jumps, implying fat tailed distributions (x > 0) of short-term 
price increments. This effect is responsible for the volatility smile (and also, as we shall 
discuss next, for the fact that options are risky). 

e The fact that the volatility is not constant, but fluctuates in time, leading to an 
anomalously slow decay of the kurtosis (slower than 1/N) and, correspondingly, to 
a non-trivial deformation of the smile with maturity. 


This hypothesis can be justified by assuming that the amplitude of the market moves is subordinated to the 
volume of transactions, which itself is obviously time dependent. 

In this context, the fact that the volatility is time dependent is called “‘heteroskedasticity’. ARCH models (Auto 
Regressive Conditional Heteroskedasticity) and their relatives have been invented as a framework 10 mode] 
such effects, See Section 2.9. 
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Fig. 4.4. Time dependence of the scale parameter y, obtained from the analysis of the 
intra-day fluctuations of the underlying asset (here the Bund), or from the implied volatility 
of at-the-money options. More precisely, the historical determination of y comes from the 
daily average of the absolute value of the 5-min price increments. The two curves are then 
smoothed over 5 days. These two determinations are strongly correlated, showing that the 
option price mostly reflects the instantaneous volatility of the underlying asset itself. 


It is interesting to note that, through trial and errors, the market as a whole has evolved 
to allow for such non-trivial statistical features —at least on most actively traded markets. 
This might be called ‘market efficiency’; but contrarily to stock markets where it is difficult 
to judge whether the stock price is or is not the ‘true’ price (which might be an empty 
concept), option markets offer a remarkable testing ground for this idea. It is also a nice 
example of adaptation of a population (the traders) to a complex and hostile environment. 
which has taken place in a few decades! 


In summary, part of the information contained in the implied volatility surface 
Y(x,. 7) used by market participants can be explained by an adequate statistical 
model of the underlying asset fluctuations. In particular, in weakly non-Gaussian 
markets, an important parameter is the time-dependent kurtosis, see Eq. (4.58). The 
anomalous maturity dependence of this kurtosis encodes the fact that the volatility 
is itself time dependent. se 


4.4 Optimal strategy and residual risk 
4.4.1 Introduction 


In the above discussion, we have chosen a model of price increments such that the 
cost of the hedging strategy (i.e. the term (y%4x;,)) could be neglected, which is 
justified if the excess return mt is zero, or else for short maturity options. Beside the 
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Fig. 4.5. Implied Black-Scholes volatility and fitted parabolic smile. The circles corre- 
spond to all quoted prices on 26th April, 1995, on options of 1-month maturity, The error 
bars correspond to an error on the price of +1 basis point. The curvature of the parabola 


_ allows one to extract the implied kurtosis « (7') using Eq. (4.58). 


fact that the correction to the option price induced by a non-zero return is important 
to assess (this will be done in the next section), the determination of an ‘optimal’ 
hedging strategy and the corresponding minimal residual risk is crucial for the 
following reason. Suppose that an adequate measure of the risk taken by the writer 
of the option is given by the variance of the global wealth balance associated to the 


operation, i,e.:7? 
‘ R= V(AW7[9)). (4.60) 


As we shall find below, there is a special strategy ¢” such that the above quantity is 
reaches a minimum value. Within the hypotheses of the Black-Scholes model, this 
minimum risk is even, rather surprisingly, strictly zero. Under less restrictive and 
more realistic hypotheses, however, the residual risk R* = ,/(AW2[@*]) actually 
amounts to a substantial fraction of the option price itself. It is thus rather natural 


_ for the writer of the option to try to reduce further the risk by overpricing the option, 


adding to the “fair price’ a risk premium proportional to R*—in such a way that the 
probability of eventually losing money is reduced. Stated otherwise, option writing 
being an essentially risky operation, it should also be, on average, profitable. 
Therefore, a market maker on option markets will offer to buy and to sell options 
at slightly different prices (the ‘bid—ask’ spread), centred around the fair price C. 
The amplitude of the spread is presumably governed by the residual risk, and is 


20 tans 
The case where a better measure of risk is the loss probability or the value-at-risk will be discussed below. 
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Fig. 4.6. Plot (in log—log coordinates) of the average implied kurtosis kimp (determined 
by fitting the implied volatility for a fixed maturity by a parabola) and of the empirical 
kurtosis xy (determined directly from the historical movements of the Bund contract), as 
a function of the reduced time scale N = T/t, r = 30 min. All transactions of options 
on the Bund future from 1993 to 1995 were analysed along with 5 minute tick data of 
the Bund future for the same period. We show for comparison a fit with ky ~ N7~ ome 
(dark line), as suggested by the results of Section 2,4. A fit with an exponentially decaying 
volatility correlation function is however also acceptable (dash line). 


thus -+AR*, where 4 is a certain numerical factor, which measures the price of 
risk. The search for minimum risk corresponds to the need of keeping the bid—ask 
spread as small as possible, because several competing market makers are present. 
One therefore expects the coefficient A to be smaller on liquid markets. On the 
contrary, the writer of an OTC option usually claims a rather high risk premium 
A. 

Let us illustrate this idea by Figures 4.7 and 4.8, generated using real market 
data. We show the histogram of the global wealth balance AW, corresponding 
to an at-the-money option, of maturity equal to 3 months, in the case of a bare 
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Fig. 4.7. Histogram of the global wealth balance AW associated with the writing of an 
at-the-money option of maturity equal to 60 trading days. The price is fixed such that on 
average (AW) is zero (vertical line). This figure corresponds to the case where the option 
is not hedged (@ = 0). The RMS of the distribution is 1.04, to be compared with the 
price of the option C = 0.79. The ‘peak’ at 0.79 thus corresponds % the cases where 
the option is not exercised, which happens with a probability close to + 5 ! for at-the- -money 
options. 


position (¢ = 0), and in the case where one follows the optimal strategy (¢ = ¢*) 
prescribed below. The fair price of the option is such that (AW) = 0 (vertical thick 
line). It is clear that without hedging, option writing is a very risky operation. The 
optimal hedge substantially reduces the risk, though the residual risk remains quite 
high. Increasing the price of the option by an amount AR* corresponds to a shift 
of the vertical thick line to the left of Figure 4.8, thereby reducing the weight of 
unfavourable events. 
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Fig. 4.8. Histogram of the global wealth balance AW associated to the writing of the same 
option, with a price still fixed such that {AW) = 0 (vertical line), with the same horizontal 
scale as in the previous figure. This figure shows the effect of adopting an optimal hedge 
(@ = @*), recalculated every half-hour, and in the absence of transaction costs. The RMS 
R* is clearly smaller (= 0.28), but non-zero. Note that the distribution is skewed towards 
AW < 0. Increasing the price of the option by A7* amounts to diminishing the probability 
of losing money (for the writer of the option), P(AW < 0). 


Another way of expressing this idea is to use R* as a scale to measure the 
difference between the market price of an option Cy and its theoretical price: 
M—C . 
Aas a (4.61) 
An option with a Jarge value of A is an expensive option, which includes a large 
risk premium. 
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4.4.2 A simple case 


Let us now discuss how an optimal hedging strategy can be constructed, by 
focusing first on the simple case where the amount of the underlying asset held in 
the portfolio is fixed once and for all when the option is written, i.e. at ¢ = 0. This 
extreme case would correspond to very high transaction costs, so that changing 
one’s position on the market is very costly. We shall furthermore assume, for 
simplicity, that interest rate effects are negligible, ic. 9 = 0. The global wealth 
balance, Eq. (4.28), then reads: 
N-1 
AW =C—max(xy —x.,0) + D> dxx. (4.62) 
k=0 
In the case where the average return is zero ((Sx,) = 0) and where the increments 
are uncorrelated (i.e. (6x,5x;) = Drd;,;), the variance of the final wealth, R? = 
(AW?) — (AW)?, reads: 


R? = NDtg? — 29((xy — xo) max(xy — xs, 0)) + R2, (4.63) 


where Re is the intrinsic risk, associated to the ‘bare’, unhedged option (¢ = 0): 


2 


Re = (max(xy — x,, 0)7) — (max(xy — Xs, 0)) (4.64) 


The dependence of the risk on ¢ is shown in Figure 4.9, and indeed reveals the 
existence of an optimal value ¢ = ¢* for which R takes a minimum value. Taking 
the derivative of Eq. (4.63) with respect to ¢, 


dR 


7... =o (4.65) 


o=$" 


one finds: 
| Me 
¢* = a [ (x — X5)(x — x9) P(x, N|xo, 0) dx, (4.66) 
thereby fixing the optimal hedging strategy within this simplified framework. 


Interestingly, if P(x, N|xo, 0) is Gaussian (which becomes a better approximation 
as N = T/t increases), one can write: 


: J : (x — x.) ) ex ‘a xo)" dx (4.67) 
—. ———— (x — x, )(x — x - : 
DT J. J/2nDT (¥ ~ 0) EXP | or ) 

I a (x — x9)? 
ey ee ee rn eae 
[ JixbT * ox exp | 2DT 


giving, after an integration by parts: 6“ = P, or else the probability, calculated 
from t = 0, that the option is exercised at maturity: P = } ea P(x, N\xo, 0) dx. 
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Fig. 4.9. Dependence of the residual risk ® as a function of the strategy ¢, in the simple 
case where this strategy does not vary with time. Note that 2 is minimum for a well-defined 
value of @. 


Hence, buying a certain fraction of the underlying allows one to reduce the risk 
associated to the option. As we have formulated it, risk minimization appears as a 
variational problem. The result therefore depends on the family of ‘trial’ strategies 
@. A natural idea is thus to generalize the above procedure to the case where the 
strategy @ is allowed to vary both with time, and with the price of the underlying 
asset. In other words, one can certainly do better than holding a certain fixed 
quantity of the underlying asset, by adequately readjusting this quantity in the 
course of time. 


4.4.3 General case: ‘A’ hedging 


If one writes again the complete wealth balance, Eq. (4.26), as: 


N-I 


AW =C(1 +p)" — max(xy — x5, 0) + D> vy On) dxe, (4.68) 
k=0 = * 


the calculation of (AW?) involves, as in the simple case treated above, three types 
of terms: quadratic in y, linear in yy, and independent of w. The last family of 


terms can thus be forgotten in the minimization procedure. The first two terms of 
R? are: 


N-1 N-\ 
Do (OWEY?) (xz) — 2 YO 5x4 Max(xy — 45, 0)), (4.69) 
k=0 


k=0 
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where we have used (Sx) = 0 and assumed the increments to be of finite variance 
and uncorrelated (but not necessarily stationary nor independent):7! 


(8x~5x;) = (6x7) 8q.1- (4.70) 


We introduce again the distribution P(x, k|xo, 0) for arbitrary intermediate times 
k, The strategy wi depends now on the value of the price x;. One can express the 
terms appearing in Eq. (4.69) in the following form: 


((yf")?) (xg) = i [wy (x) P P(x, k\xo, 0) (5xz) dx, (4.71) 
and 


{yep dxx max(xy — x5, 0)) = / WN (x) P(x, k|xo, 0) dx (4.72) 
+00 
x / (8Xk) (b> (NX — Xe) P(x", Nix, k) dx’, 


where the notation (5x,)(x,4)-+(x’,v) Means that the average of 5x, is restricted to 
those trajectories which start from point x at time & and end at point x’ at time 
N. Without this restriction, the average would of course be (within the present 
hypothesis) zero. 

The functions vy’ (x), fork = 1,2,..., N, must be chosen so that the risk R 
is as small as possible. Technically, one must ‘functionally’ minimize Eq. (4.69). 
We shall not try to justify this procedure mathematically; one can simply imagine 
that the integrals over x are discrete sums (which is actually true, since prices 


_ are expressed in cents and therefore are not continuous variables). The function 


Wj‘ (x) is thus determined by the discrete set of values of yj’ (i). One can then 
take usual derivatives of Eq. (4.69) with respect to these yj‘ (i). The continuous 
limit, where the points / become infinitely close to one another, allows one to 
define the functional derivative 4/4 y,;‘(x), but it is useful to keep in mind its 
discrete interpretation. Using Eqs.(4.71) and (4.73), one thus finds the following 
fundamental equation: 


aR 
ay (x) 


+00 
— 2P(x, k|xo, 0 | (8xx) ob) Ny(X’ — Xs) P(x’, Nx, k) dx’. 


= WN (x) P(x, klxo, 0) (5x?) (4.73) 


Setting this functional derivative to zero then provides a general and rather explicit 
expression for the optimal strategy y* (x), where the only assumption is that the 


21 The case of correlated Gaussian increments is considered in [Bouchaud and Sornette]. 
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increments 5x, are uncorrelated (cf. Eq. (4.70)):22 


+x 
vi") = / (Xk) ooo (e — x5)PO', Nix. k) de’, (4.74) 


(xg) 
This formula can be simplified in the case where the increments 5.x, are identically 
distributed (of variance Dt) and when the interest rate is zero. As shown in 
Appendix D, one then has: 


x’ —x 
N-k 
The intuitive interpretation of this result is that all the N — k increments 5x; along 


the trajectory (x,k) — (x’, N) contribute on average equally to the overall price 
change x’ — x. The optimal strategy then finally reads: 


(8Xk)ix,Q>0,N) = (4.75) 


+00 x’—-x 


Nx fue 
= . DtN—-h 


(x’ — x5)P(x', N|x, k) dx’. (4.76) 
We leave to the reader to show that the above expression varies monotonically 
from ¢j'*(—00) = 0 to ¢'*(+00) = 1; this result holds without any further 
assumption on P(x’, N|x,k). If P(x’, N|x, k) is well approximated by a Gaussian, 
the following identity then holds true: 


xx APG(x’, Nix, k) 
— Pg (x', Nix, k) = SE) 
DrW ah Pes Nie b) = (4.77) 


The comparison between Eqs (4.76) and (4.33) then leads to the so-called ‘Delta’ 
hedging of Black and Scholes: 


oN (x ee OCs Lx, Xs. N- kj 
Ox — 
= A(x, N—k). (4.78) 


One can actually show that this result is true even if the interest rate is non-zero 
(see Appendix D), as well as if the relative returns (rather than the absolute returns) 
are independent Gaussian variables, as assumed in the Black-Scholes model. 
Equation (4.78) has a very simple (perhaps sometimes misleading) interpreta- 
tion. If between time k and k + | the price of the underlying asset varies by a small 
amount dx,, then to first order in dx,, the variation of the option price is given 
by A{x, xs, N — k] dx, (using the very definition of A). This variation is therefore 
exactly compensated by the gain or loss of the strategy, i.e. ON *(x = xy) dx. In 
other words, ,'* = A(x;,, N —k) appears to be a perfect hedge, which ensures that 


?2 This is true within the variational space considered here, where ¢ only depends on x and f. If some volatility 


ii esa ga are present. one could in principle consider strategies that depend on the past values of the 
volatility. 


a ee 
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the portfolio of the writer of the option does not vary at all with time (cf. below, 
when we will discuss the differential approach of Black and Scholes). However, as 
we discuss now, the relation between the optimal hedge and A is not general, and 
does not hold for non-Gaussian statistics. 


Cumulant corrections to A hedging 


More generally, using Fourier transforms, one can express Eq. (4.76) as a cumulant 
expansion. Using the definition of the cumulants of P. one has: 


f ' = 1 ae —iz(x’—x) 
(x —x)P(x’, N[x.0) = = | Puan? dz 
oo an—l 
(= ae 4 a” f 
=--) a = sma PO’, Nix, 0), (4.79) 
n=2 . 


where log Py (j= a(t z)"Cn,w/nt. Assuming that the increments are independent, 
one can use the additivity property of the cumulants discussed in Chapter 1, i.e: Ca.n = 
Noni, Where the cn,; are the cumulants at the elementary time scale t. One finds the 


following general formula for the optimal strategy:?3 


tnt 8" 1C[x, x5, NI 


Nx ~s A f 
hae d @—b! ae? (4.80) 


where c2,, = Dt, c4., = xi [¢21)?, etc. In the case of a Gaussian distribution, cn.) = 0 
for all n > 3, and one thus recovers the previous ‘Delta’ hedging. If the distribution of the 
elementary increments 5x, is symmetrical, c3,; = 0. The first correction is then of order 
€4,1/€2,1 X (1f//DtN)? = k/N, where we have used the fact that C{x, xs, N] typically 
varies on the scale x ~ / DtN, which allows us to estimate the order of magnitude of its 
derivatives with respect to x. Equation (4.80) shows that in general, the optimal strategy 


_ is not simply given by the derivative of the price of the option with respect to the price of 


the underlying asset, 
It is interesting to compute the difference between ¢* and the Black-Scholes strategy: 
as used by the traders, $},, which takes into account the implied volatility of the option 


D (xs, T = Nr)-74 


OClxxe-T] 


“AE = 4.8 
Py (% x) Dx (4.81) 


o=k 


If one chooses the implied volatility in such a way that the ‘correct’ price (cf. Eq. (4.58) 


_ above) is reproduced, one can show that: 


a on kit Gs %0) | (&s — 30)" ess 
@ 8) = Fits 4 Fe | sar > |* (4.82) 


where higher-order cumulants are neglected. The Black-Scholes strategy, even calculated 


23 This formula can actually be generalized to the case where the volatility is time dependent, see Appendix D 
and Section 5.2.3, 

24 Note that the market does not compute the total derivative of the option price with respect to the underlying 
price, which would also include the derivative of the implied volatility. 
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Fig. 4.10. Three hedging strategies for an option on an asset of terminal value XN 
distributed according to a truncated Lévy distribution, of kurtosis 5, as a function of 
the strike price xs. 6* denotes the optimal strategy, calculated from Eq. (4.76), om is 
the Black-Scholes strategy computed with the appropriate implied volatility, such as to 
reproduce the correct option price, and P.. is the probability that the option is exercised 
at expiry, which would be the optimal strategy if the fluctuations were Gaussian. These 
three strategies are actually quite close to each other (they coincide at-the-money, and deep 
in-the-money and out-the-money). Note however that, interestingly, the variations of ¢* 
with the price are slower (smaller 7). 


with the implied volatility, is thus not the optimal strategy when the kurtosis is non-zero. 


However, the difference between the two reaches a maximum when Xs — Xo = VDT, and 
is equal to: 


So" = 0.025 T= Ne, (4.83) 


For a daily kurtosis of x, = 10 and for N = 20 days, one finds that the first correction to 
the optimal hedge is on the order of 0.01, see also Figure 4.10. As will be discussed below, 
the use of such an approximate hedging induces a rather small increase of the residual 
risk. 


4.4.4 Global hedging/instantaneous hedging 


It is important to stress that the above procedure, where the global risk associated ° 


to the option (calculated as the variance of the final wealth balance) is minimized is 
equivalent to the minimization of an ‘instantaneous’ hedging error~at least when 
the increments are uncorrelated. The latter consists in minimizing the difference of 
value of a position which is short one option and ¢ long in the underlying stock, 
between two consecutive times kt and (k + 1)t. This is closer to the concern 
of many option traders, since the value of their option books is calculated every 
day: the risk is estimated in a ‘marked to market’ way. If the price increments are 
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uncorrelated, we show below that the optimal strategy is indeed identical to the 
‘global’ one discussed above. 


The change of wealth between times k and k +1 is, in the absence of interest rates, given 
by: 

BWy = Cy — Cyss + be (xp Oxe. (4.84) 

The global wealth balance is simply given by the sum over n of all these 3W,. Let us now 


calculate the part of (5 Ww?) which depends on oy. Using the fact that the increments are 
uncorrelated, one finds: 


Oj (Xk) Dt — Why (XK) (Ce18%x)- (4.85) 


Using now the explicit expression for Csi: 
fe i 
Cen = fe! a) PO Nite b+ DAW, (4.86) 
Xs 
one sees that the second term in the above expression contains the following average: 
Kon — XK) P(Xe41,k + xe, kK) PQ’, Nea, + 1) dag yy. (4.87) 


Using the methods of Appendix D, one can show that the above average is equal to (x' — 
x /N — k)P(x', N|xx, k). Therefore, the part of the risk that depends on ox is given by: 


°° (x! — Xs) (x! — xK) 


y ., kK) dx’. 4.88 
Wisk P(x’, N\xz, k) dx (4.88) 


$2 (x4) Dr — 2e(xe) / 


‘s 


Taking the derivative of this expression with respect to $ finally leads to the optimal 
strategy, Eq. (4.76), above. 


4.4.5 Residual risk: the Black-Scholes miracle 


We shall now compute the residual risk R* obtained by substituting ¢ by ¢* in Eq. 
(4.69). One finds: 


, N-1 
R” = R2— Dr y. / P(x, k\xo. OO)*(x) 7? dx, (4.89) 
&=0 


where Ro is the unhedged risk. The Black-Scholes ‘miracle’ is the following: in the 
case where P is Gaussian, the two terms in the above equation exactly compensate 
in the limit where t — 0, with Nt = T fixed.?5 


23 The ‘zero-risk’ property is true both within an ‘additive’ Gaussian model and the Black~Scholes multiplicative 
model, or for more general continuous-time Brownian motions. This property is more easily obtained using 
it’s stochastic differential calculus, see below. The latter formalism is however only valid in the continuous- 
time limit (t = 0). 
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This ‘miraculous’ compensation is due to a very peculiar property of the Gaussian 
distribution, for which: 


Poly, Thy, 05 (x1) — x2) — Polar, Thx9, 0) Po(x2. Tlx9, 0) = 


r 8Po(x1, Tx. 0) 8 Tix; 
vf J Pot ris0, 0) 2 ED PoE a, (4.90) 
0 ax Ox 


Integrating this equation with respect to x; and x2 after multiplication by the correspond- 
ing payoff functions, the left-hand side gives Re. As for the right-hand side, one recognizes 


the limit of pee f P(x, k\xo, O)eY *(x))2 dx when t = 0, where the sum becomes an 
integral, 


Therefore, in the continuous-time Gaussian case, the A-hedge of Black and 
Scholes allows one to eliminate completely the risk associated with an option, as 
was the case for the forward contract. However, contrarily to the forward contract 
where the elimination of risk is possible whatever the statistical nature of the 
fluctuations, the result of Black and Scholes is only valid in a very special limiting 
case. For example, as soon as the elementary time scale t is finite (which is 
the case in reality), the residual risk R* is non-zero even if the increments are 
Gaussian. The calculation is easily done in the limit where t is small compared 
to T: the residual risk then comes from the difference between a continuous 
integral Dt f dr’ f Po(x, t'|xo, 0)6*? (x, t') dx (which is equal to Rz) and the 
corresponding discrete sum appearing in Eq. (4.89). This difference is given by 
the Euler-McLaurin formula and is equal to: 


D 
Re y> Pa — P) + O(c), (4.91) 


where P is the probability (at ¢ = 0) that the option is exercised at expiry (t = T).?6 
In the limit t — 0, one indeed recovers R* = 0, which also holds if P > 
0 or 1, since the outcome of the option then becomes certain. However, Eq. (4.91) 
already shows that in reality, the residual risk is not small. Take for example an 
at-the-money option, for which P = i. The comparison between the residual risk 
and the price of the option allows one to define a ‘quality’ ratio Q for the hedging 


strategy: 
R* a 
CV 4N’ 


with N = 7/t. For an option of maturity | month, rehedged daily, N ~ 25. 
Assuming Gaussian fluctuations, the quality ratio is then Q ~ 0.2. In other words, 
the residual risk is one-fifth of the price of the option itself. Even if one rehedges 
every 30 min in a Gaussian world, the quality ratio is already @ ~ 0.05. If the 


Q 


ill 


26 : 2 ie — - - ‘ 
The above formula is only correct in the additive limit, but can be generalized to any Gaussian model, in 
particular the log-normal Black-Scholes model. 


(4.92) ° 
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increments are not Gaussian, then * can never reach zero. This is actually rather 
intuitive, the presence of unpredictable price ‘jumps’ jeopardizes the differential 
strategy of Black-Scholes. The miraculous compensation of the two terms in 
Eq. (4.89) no longer takes place. Figure 4.11 gives the residual risk as a function 
of t for an option on the Bund contract, calculated using the optimal strategy 
*, and assuming independent increments. As expected, 72* increases with t, 
but does not tend to zero when ft decreases: the theoretical quality ratio saturates 
around Q = 0.17. In fact, the real risk is even larger than the theoretical estimate 
since the model is imperfect. In particular, it neglects the volatility fluctuations, 
i.e. that of the scale factor y,. (The importance of these volatility fluctuations 
for determining the correct price was discussed above in Section 4.3.4.) In other 
words, the theoretical curve shown in Figure 4.11 neglects what is usually called 
the ‘volatility’ risk. A Monte-Carlo simulation of the profit and loss associated 
to an at-the-money option, hedged using the optimal strategy ¢* determined 
above leads to the histogram shown in Figure 4.8. The empirical variance of 
this histogram corresponds to Q ~ 0.28, substantially larger than the theoretical 


_ estimate. 


The ‘stop-loss’ strategy does not work 


There is a very simple strategy that leads, at first sight, to a perfect hedge. This 
strategy is to hold @ = | of the underlying as soon as the price x exceeds the strike 
price x,, and to sell everything (@ = 0) as soon as the price falls below x,. For 
zero interest rates, this ‘stop-loss’ strategy would obviously lead to zero risk, since 
either the option is exercised at time 7, but the writer of the option has bought 
the underlying when its value was x,, or the option is not exercised, but the writer 
does not possess any stock. If this were true, the price of the option would actually 
be zero, since in the global wealth balance, the term related to the hedge perfectly 
matches the option pay-off! 

In fact, this strategy does not work at all. A way to see this is to realize 
that when the strategy takes place in discrete time, the price of the underlying 
is never exacily at x;, but slightly above, or slightly below. If the trading time 
is t, the difference between x, and x, (where & is the time in discrete units) 
is, for a random walk, of the order of /Dr. The difference between the ideal 
strategy, where the buy or sell order is always executed precisely at x,, and 
the real strategy is thus of the order of Ny JDr, where N, is the number of 
times the price crosses the value x, during the lifetime T of the option. For an 
at-the-money option, this is equal to the number of times a random walk returns 
to its starting point in a time 7. The result is well known, and is Ny a /T/t 
for T >> t. Therefore, the uncertainty in the final result due to the accumulation 
of the above small errors is found to be of order DT, independently of r, and 
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Fig. 4.11, Residual risk R* as a function of the time z (in days) between adjustments of 
the hedging strategy *, for at-the-money options of maturity equal to 60 trading days. 
The risk ?* decreases when the trading frequency increases, but only rather slowly: the 
risk only falls by 10% when the trading time drops from | day to 30 min. Furthermore, 
R* does not tend to zero when t — 0, at variance with the Gaussian case (dotted line). 
Finally, the present model neglects the volatility risk, which leads to an even larger residual 


risk (marked by the symbol ‘M.C.’), corresponding to a Monte-Carlo simulation using real, 


price changes. 


therefore does not vanish in the continuous-time limit t — 0. This uncertainty is 
of the order of the option price itself, even for the case of a Gaussian process. 
Hence, the ‘stop-loss’ strategy is not the optimal strategy, and was therefore 


not found as a solution of the functional minimization of the risk presented 
above. 
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Residual risk to first order in kurtosis 
Ir is interesting ta compute the first non-Gaussian correction to the residual risk, which is 
induced by a non-zero kurtosis «, of the distribution at scale t. The expression of 0R* /dx 
can be obtained from Eq. (4.89), where two types of term appear: those proportional to 
aP/dx,, and these proportional to 0¢* {0x ;. The latter terms are zero since by definition 
the derivative of R* with respect to $* is nil. We thus find: 


aR* 
dK) 


es i] (x — x6)[(x — x6) — 2(max(x(N) — 25, 0))] 


aP(x, N[xo, 0) pee Nada? 
—————— dx -D —__——[o,*}* dx’. 4.93 
x Oe x t 2 Ok (oy *}° dx (4.93) 


Now. to first order in kurtosis, one has (cf: Chapter 1): 


. 


OP(x’,kix0.0) _ (Dr)? a4 Pg(x', k\x0, 0) 


OK 4! a pe ; ar) 
which allows one to estimate the extra risk numerically, using Eq. (4.93). The order of 
magnitude of all the above terms is given by @R* /x, = Dt/4!. The relative correction 
to.the Gaussian result, Eq. (4.91), is then simply given, very roughly, by the kurtosis ky. 
Therefore. tail effects, as measured by the kurtosis, constitute the major source of the 
residual risk when the trading time t is small. 


Stochastic volatility models 


It is instructive to consider the case where the fluctuations are purely Gaussian, but where 
the volatility itself is randomly varying. In other words, the instantaneous variance is given 
by Dy = D+5Dg, where 5D, is itself a random variable of variance (5 D)*. If the different 
5 Dy’s are uncorrelated, this model leads to a non-zero kurtosis (cf. Eq. (2.17) and Section 
1.7.2) equal to ky = 3(6D)2{D’. 

Suppose then that the option trader follows the Black-Scholes strategy corresponding to 
the average volatility D. This strategy is close, but not identical to, the optimal strategy 
which he would follow if he knew all the future values of 5D, (this strategy is given in 
Appendix D). If Dy < D, one can perform a perturbative calculation of the residual 
risk associated with the uncertainty on the volatility. This excess risk is certainly of order 
(8D)? since one is close to the absolute minimum of the risk, which is reached for 5D = 0. 
The calculation does indeed yield a relative increase in risk whose order of magnitude is 


given by: 


bR* (5D)? 


Re x = (4.95) 


If the observed kurtosis is entirely due to this stochastic volatility effect, one has SR*/R* x 


_ Ky. One thus recovers the result of the previous section. Again, this volatility risk can 


represent an appreciable fraction of the real risk, especially for frequently hedged options. 
Figure 4.11 actually suggests that the fraction of the risk due to the fluctuating volatility is 
comparable to that induced by the intrinsic kurtosis of the distribution. 
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4.4.6 Other measures of risk — hedging and VaR (*) 


It is conceptually illuminating to consider the model where the price increments 
dx, are Lévy variables of index yz < 2, for which the variance is infinite. Such 
a model is also useful to describe extremely volatile markets, such as emergent 
countries (like Russia!), or very speculative assets (cf. Chapter 2). In this case, the 
variance of the global wealth balance AW is meaningless and cannot be used as a 
reliable measure of the risk associated to the option. Indeed, one expects that the 
distribution of AW behaves, for large negative AW, as a power-law of exponent 
LL. 

In order to simplify the discussion, let us come back to the case where the 
hedging strategy @ is independent of time, and suppose interest rate effects 
negligible. The wealth balance, Eq. (4.28), then clearly shows that the catastrophic 
losses occur in two complementary cases: 


e Either the price of the underlying soars rapidly, far beyond both the strike price 
x; and x9. The option is exercised, and the loss incurred by the writer is then: 


JAW,| =xy(1 — @) — x5 + 6x9 —C ~ xn (1 — 9). (4,96) 


The hedging strategy @, in this case, limits the loss, and the writer of the option 
would have been better off holding @ = 1 stock per written option. 

e Or. on the contrary, the price plummets far below x9. The option is then not 
exercised, but the strategy leads to important losses since: 


|AW_| = d(% —xy) -CX —pxXy. (4.97) 
In this case, the writer of the option should not have held any stock at all (¢@ = 0). 


However, both dangers are a priori possible. Which strategy should one follow? 
Thanks to the above argument, it is easy to obtain the tail of the distribution of AW 
when AW — —oo (large losses). Since we have assumed that the distribution of 
Xy — Xo decreases as a power-law for large arguments, 


pAt a, 9 
P(xn, N|xo, 0 & =, 98)" ‘ 
Ge N00) | (4:98 
it is easy to show, using the results of Appendix C, that: 
AW; 
P(AW) ~ +20 AWE = AK(L—o)# + AMG". (4.99) 


AW>-20 |AW|+# 


The probability that the loss | A W| is larger than a certain value is then proportional 
to AW (cf. Chapter 3). The minimization of this probability with respect to ¢ then 
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leads to an optimal ‘value-at-risk’ strategy: 


AS ‘7 
ee ¢=—-. (4.100) 
AL + AL ul 


* 


for 1 < yx < 2.7’ For zx < 1, $* is equal to 0 if A > A, or to | in the opposite 
case. Several points are worth emphasizing: 


e The hedge ratio ¢* is independent of moneyness (x; — xo). Because we are 
interested in minimizing extreme risks, only the far tail of the wealth distribution 
matters. We have implicitly assumed that we are interested in moves of the stock 
price far greater than |x, — Xo|, i.e. that moneyness only influences the centre of 
the distribution of AW. 

e It can be shown that within this value-at-risk perspective, the strategy $* is 
actually time independent, and also corresponds to the optimal instantaneous 
hedge, where the VaR between times k and k + | is minimum. 

e Even if the tail amplitude A Wo is minimum, the variance of the final wealth is 
still infinite for p < 2. However, AW; sets the order of magnitude of probable 
losses, for example with 95% confidence. As in the example of the optimal 
portfolio discussed in Chapter 3, infinite variance does not necessarily mean that 
the risk cannot be diversified. The option price, fixed for z > 1 by Eq, (4.33), 
ought to be corrected by a risk premium proportional to AW,. Note also that 
with such violent fluctuations, the smile becomes a spike! 

e Finally, it is instructive to note that the histogram of AW is completely asym- 
metrical, since extreme events only contribute to the loss side of the distribution. 
As for gains, they are limited, since the distribution decreases very fast for 
positive AW.?8 In this case, the analogy between an option and an insurance 
contract is most clear, and shows that buying or selling an option are not at all 
equivalent operations, as they appear to be in a Black-Scholes world. Note that 
the asymmetry of the histogram of AW is visible even in the case of weakly 
non-Gaussian assets (Fig. 4.8). 


As we have just discussed, the losses of the writer of an option can be very large 
in the case of a Lévy process. Even in the case of a ‘truncated’ Lévy process (in 
the sense defined in Chapters | and 2), the distribution of the wealth balance AW 
remains skewed towards the loss side. It can therefore be justified to consider other 
measures of risk, not based on the variance but rather on higher moments of the 
distribution, such as Rs = ((AW*))!4. which are more sensitive to large losses. 
The minimization of Ry with respect to @(x, t) can still be performed, but leads 
27 Note that the above strategy is stil] valid for j2 > 2, and corresponds to the optimal VaR hedge for power-law- 


tailed assets, see below. 
28 For at-the-money options, one can actually show that AW < C, 
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to a more complex equation for @*, which has to be solved numerically. One finds 
that this optimal strategy varies more slowly with the underlying price x than that 
based on the minimization of the variance, which is interesting from the point of 
view of transaction costs.?? 

Another possibility is to measure the risk through the value-at-risk (or loss 
probability), as is natural to do in the case of the Lévy processes discussed above. 
If the underlying asset has power-law fluctuations with an exponent yp > 2, the 
above computation remains valid as long as one is concerned with the extreme 
tail of the loss distribution (cf. Chapter 3). The optimal VaR strategy, minimizing 
the probability of extreme losses, is determined by Eq. (4.100). This strategy is 
furthermore time independent, and therefore is very interesting from the point of 
view of transaction costs, 


4.4.7 Hedging errors 


The formulation of the hedging problem as a variational problem has a rather 
interesting consequence, which is a certain amount of stability against hedging 
errors. Suppose indeed that instead of following the optimal strategy * (x,t), one 
uses a suboptimal strategy close to ¢*, such as the Black-Scholes hedging strategy 
with the value of the implied volatility, discussed in Section 4.4.3. Denoting the 
difference between the actual strategy and the optimal one by 5@(x, rt), one can 
show that for small 5, the increase in residual risk is given by: 


j=) 
6R? = Dt yy [vo tr) P P(x, klxo, 0) dx, (4.101) 
k=0 


which is quadratic in the hedging error 3@, and thus, in general, rather small. For 
example, we have estimated in Section 4.4.3 that within a first-order cumulant 
expansion, 6 is at most equal to 0.02«y, where Ky is the kurtosis corresponding 
to the terminal distribution. (See also Fig. 4.10.) Therefore, one has: 


8R° <4 10-*e2, DT. (4.102) 
For at-the-money options, this corresponds to a relative increase of the residual] risk 
given by: 7 
bR ate 
- = 1.2 10 “Or (4.103) 


For a quality ratio Q = 0.25 and ky = 1, this represents a rather small relative 
increase equal to 2% at most. In reality, as numerical simulations show, the increase 
in risk induced by the use of the Black-Scholes A-hedge rather than the optimal 


3 . 
*9 This has been shown in the PhD work of Farhat Selmi (2000), unpublished. 
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hedge is indeed only of a few per cent for |-month maturity options, This difference 
however increases as the maturity of the options decreases. 


4.4.8 Summary 


In this part of the chapter, we have thus shown that one can find an optimal 
hedging strategy, in the sense that the risk (measured by the variance of the 
change of wealth) is minimum. This strategy can be obtained explicitly, with 
very few assumptions, and is given by Eqs (4.76), or (4.80). However, for a 
non-linear pay-off, the residual risk is in general non-zero, and actually represents 
an appreciable fraction of the price of the option itself. The exception is the 
Black-Scholes model where the risk, rather miraculously, disappears. The theory 
presented here generalizes that of Black and Scholes, and is formulated as a 
variational theory. Interestingly, this means that small errors in the hedging strategy 
increases the risk only in second order. 


4.5 Does the price of an option depend on the mean return? 
4.5.1 The case of non-zero excess return 


We should now come back to the case where the excess return m, = (Sx,) = mt is 
non-zero. This case is very important conceptually: indeed, one of the most striking 
result of Black and Scholes (besides the zero risk property) is that the price of the 
option and the hedging strategy are rotally independent of the value of m. This 
may sound at first rather strange, since one could think that if m is very large 
and positive, the price of the underlying asset on average increases fast, thereby 
increasing the average pay-off of the option. On the contrary, if m is large and 
negative, the option should be worthless. 

This argument actually does not take into account the impact of the hedging 
strategy on the global wealth balance, which is proportional to m. In other 
words, the term max(x(N) — x;,0), averaged with the historical distribution 
Pp (x. N|xq, 0), such that: 


Py (x. N|x9,0) = Pmoo(x — Nm, N|Xxo, 9), (4.104) 


is indeed strongly dependent on the value of m. However, this dependence is partly 
compensated when one includes the trading strategy, and even vanishes in the 
Black-Scholes model. 

Let us first present a perturbative calculation, assuming that m is small, or 
more precisely that (mT)?/DT < 1. Typically, for T = 100 days, mT = 
5% 100/365 = 0.014 and /DT ~ 1%/100 = 0.1. The term of order m? that 
we neglect corresponds to a relative error of (0.14)? ~ 0.02. 
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The average gain (or loss) induced by the hedge is equal to? 
(AWs) = +m, > / P(x, klxo. OO N* (x) de. (4.105) 
r= 


To order m, one can consistently use the general result Eg. (4.80) for the optimal 
hedge @*. established above for m = 0, and also replace Px by Pm—o 


(AWs) = ae / Po(x, kxo, 0) (4.106) 


_(C)ren a ye Cal a 
x _ , a er 
‘. (x" — x5) >? s Drin— Dr(in— lax = 7 Polx , Nix, k) dx’ dx. 


where Pp is the unbiased distribution (m = 0). 
Now. using the Chapman—Kolmogorov equation for conditional probabilities: 


/ Po(x’, Nix, k) Po(x, k|xo, 0) dx = Po(x', N|x0. 0), (4.107) 


one easily derives, after an integration by parts, and using the fact that 
Po(x', N|xo. 0) only depends on x’ — xo, the following expression: 


+00 
(AWs) = mnt f Po(x’, N|xo, 0) dx’ (4.108) 


+ » Cal gn-3 Le N 
n= Dr(n ar 1)! IX5 Lae ax" o(%s, |xo, 0) 


On the other hand, the increase of the average pay-off, due to a non-zero mean 
value of the increments, is calculated as: 


90 
(max(x(N) —x5,0))m = (x’ — x5) Pn (x', N|xo, 0) dx’ (4.109) 


Xs 


~ 

/ (x + myN — x.) Po(x', N|xo, 0) dx’, 
xm N 

where in the second integral, the shift x, — x; +m ,k allowed us to remove the 

bias m, and therefore to substitute P,, by Po. To first order in m, one then finds: 


(max(x (NV) — Xs. O)}m = (max(x(V) — Xs, 0))o a 
00 
m,N / Po(x’, N|xo. 0) dx’. (4.110) 


30 " ee ies 3 
In the following, we shall again stick to an additive model and discard interest rate effects. in order to focus 
on the main concepts. 


ee er ee? Ce ee) 


Hence, grouping together Eqs (4.108) and (4.110), one finally obtains the price of 
the option in the presence of a non-zero average return as: 


an—3 


Se y sl Po(xs. N xo. 0) (4.111) 
2 = O-— —— Xs, N|xo. 0). : 
: T (n — 1)! axe” 


n=3 
Quite remarkably, the correction terms are zero in the Gaussian case, since all the 
cumulants c,.; ate zero for n > 3. In fact, in the Gaussian case. this property holds 
to all orders in m (cf. below). However, for non-Gaussian fluctuations, one finds 
that a non-zero return should in principle affect the price of the options. Using 
again Eq. (4.79), one can rewrite Eq. (4.111) in a simpler form as: 


Cm = Co +mT [P — ¢*] (4.112) 


where P is the probability that the option is exercised, and #* the optimal strategy, 
both calculated at f = 0. From Fig. 4.10 one sees that in the presence of “fat tails’, a 
positive average return makes out-of-the-money options less expensive (P < $*), 
whereas in-the-money options should be more expensive (P > $*). Again, the 
Gaussian model (for which P = ¢*) is misleading:*! the independence of the 
option price with respect to the market ‘trend’ only holds for Gaussian processes, 
and is no longer valid in the presence of ‘jumps’. Note however that the correction 


_ is usually numerically quite small: for x9 = 100, m = 10% per year, T = 100 


days, and /P — ¢*| ~ 0.02, one finds that the price change is of the order of 0.05 
points, while C ~ 4 points. 


‘Risk neutral’ probability 
It is interesting to notice that the result, Eq. (4.111), can alternatively be rewritten as: 


foe) 
Ga = / (x — xs) Q(x, N|xq, 0) dx, (4.113) 


* with an ‘effective probability’ (called ‘risk neutral probability’, or ‘pricing kernel’ in the 


mathematical literature)-Q defined as: 


Q(x, N\xo,0) = Po(x. N |x, 0) (4.114) 
my tan On! 


— Po(x’, N\xo, 9), 
t a (n = 1)! axt™! ves | 0 ) 


which satisfies the following equations: 


| Q(x, N\xo, 0) dx = 1, (4.115) 


[o — x9) O(x, N|xo. 0) dx = 0. (4.116) 


31 The fact that the optimal strategy is equal to the probability P of exercising the option also holds in the 
Black-Scholes model, up to small ¢“ correction terms. 
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Note that the second equation means that the evoliaion of x under the ‘probability’ O is 
unbiased, Le. (x}|Q = Xo. This is the definition of a martingale process (see, e.g. [Baxter]). 
Equation (4.113), with the very same Q, actually holds for an arbitrary pay-off function 
V(xn), replacing max(xy — xs, 0) in the above equation. Using Eq. (4.79), Eq. (4.114) 
can also be written in a more compact wavy as: 


Q(x. Nixo,0) = Pox, NIxo, 0) — sete — x9) Po(x. N|xo, 0) 
@Po(x, Nix, 0) 


+im,N 
: a Xo 


(4.117) 


Note however that Eq. (4.113) is rather formal, since nothing ensures that Q(x, N|xo. 0) is 
everywhere positive, except in the Gaussian case where Q(x. N|xo, 0) = Po(x, N{xo. 0). 


As we have discussed above, small errors in the hedging strategy do not 
significantly increase the risk. If the followed strategy is not the optimal one 
but, for example, the Black-Scholes ‘A’-hedge (i.e. @ = A = 8C/dx), the fair 
game price is given by Eq. (4.113) with Q(x, N|xo,0) = Po(x, N|xo. 0), and 
is now independent of m and positive everywhere.*? The difference between this 
‘suboptimal’ price and the truly optimal one is then, according to Eq. (4.112), equal 
to dC = mT(¢* — P). As already discussed, the above difference is quite small, 
and leads to price corrections which are, in most cases, negligible compared to the 
uncertainty in the volatility and to the residual risk R*. 


Optimal strategy in the presence of a bias 


We now give, without giving the details of the computation, the general equation Satisfied 
by the optimal hedging strategy in the presence of a non-zero average return m, when the 
price fluctuations are arbitrary but uncorrelated. Assuming that (Sxp5x¢) — m2 = Dtdx,¢, 
and introducing the unbiased variables xx = x, — mk, one gets for the optimal Strategy 
#; (X) the following (involved) integral equation: 


y=) 
Droy(x) -{ (x0 = x8) FPO, NIX, ky dx! = 


s ‘ 


oO 
—m| (x' — xs)LPo(x’, NI x0. 0) — Po(x’, Nix. k)] dy’ 
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— msl, (4.118) 


32 TF Sd ‘ = le , 
““ If od = A, the result Q = Pp is in fact correct (as we show in Appendix F) beyond the first order in m. 
However, the optimal strategy is not, in general, given by the option 4, except in the Gaussian case. 


~~) 
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with xy, =x, —m,N and 


N-1 
= Sof ita’ Po(x'. Hlx0. 0) dy’ (4.119) 
k= 


In the limit my, = 0, the right-hand side vanishes and one recovers the optimal strategy 
determined above. For small m, Eq. (4.118) is a convenient starting point for a perturbative 
expansion in m1}. — 

Using Eq. (4.118), one can establish a simple relation for b*, which fixes the correction 
to the ‘Bachelier price’ coming from the hedge: C = (max(xy — Xs, 0)) — mi". 


= x) = x —xX ' ' 
= Yo] u'-x) Po(x'. Nix, k) Pox, klx0. 0) dx 
k=0 “Xs N-k 
N-1 x’ _x . 
-my YY f Eola Poe’, thx0, 04x". (4.120) 
/ mie =o * 


Replacing $; (x') by its corresponding equation m, = 0, one obtains the correct value of 
@* to order mj, and thus the value of C to order me included. 


4.5.2 The Gaussian case and the Black-Scholes limit 
In the continuous-time Gaussian case, the solution of Eqs (4.118) and (4.119) for 
the optimal strategy @* happens to be completely independent of m (cf. next section 
on the [to calculus). Coming back to the variable x, one finds: 
(x =x)? 


a 1 2 a _— @ =x 
vanaf JaDFon* sg exp| 2D(T —1) 


The average profit induced by the hedge is thus: 


J. (4.121) 


_ if a (x — x9 — mt)? 
mio* =m ——— ¢* (x, t) exp | dxdt. (4.122) 
ie I | J2n Dt 2Dt 
Performing the Gaussian integral over x, one finds (setting u = x'—xs and uy = Xg—Xs): 
T reo u a (u — ug — mt? 
—m i | — exp | —-————_———_ } du dt 
0 Jo V22DT du 2DT 


T fo u a (u — ug — mt)? 
i i ——— — exp | —-—--—-——_ ] du dr, 
0 Jo 2x DT ot 2DT 


m\o* 


or else: 
a ' u . (u —ug — mT)? 
* = a xX St See leg 
aie o JixDT | 2DT 


Zz: 
= exp “sae || (4.123) 
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The price of the option in the presence of a non-zero return m # Vis thus given. in 
the Gaussian case, by: 


i (u — ug —mT)? a 
Gion,.2) = ——— exp | - ——___——__ j du - - 
(Xp, Xs. 7D) [ Br al IDT | u—mi@d 


s Lb aeeo|-GE | 
0 V¥2xDT 2DT 
= Cn=0(%0,. Xs, T), (4.124) 


(cf. Eq. (4.43)). Hence, as announced above, the option price is indeed independent 
of m in the Gaussian case. This is actually a consequence of the fact that the trading 
strategy is fixed by ¢* = 9C,,/dx, which is indeed correct (in the Gaussian case) 
even when m # 0. These results, rather painfully obtained here, are immediate 
within the framework of stochastic differential calculus, as originally used by Black 
and Scholes. It is thus interesting to pause for a moment and describe how option 
pricing theory is usually introduced. 


Ito calculus*? 
The idea behind Ito’s stochastic calculus is the following. Suppose that one has to consider 
a certain function f (x,t), where x is a time-dependent variable. If x was-an ‘ordinary’ 
variable, the variation Af of the function f between time t and t +t would be given, for 
small t, by: 


2 
Af = TOO. OF, Ny, 418 F(x,1) 2 


a a (4.125) 


with Ax = (dx/dt)r. The order t? in the above expansion looks negligible in the limit 
t —» 0. However, if x is a stochastic variable with independent increments, the order of 
magnitude of x(t) — x(O) is fixed by the CLT and is thus given by o, /t/t &, where & isa 
Gaussian random variable of width equal to 1, and 0 is the RMS of Ax. 

If the limit t — 0 is to be well defined and non-trivial (i.e. such that the random variable 
E still plays a role), one should thus require that a; & ./t. Since oy is the RMS of (dx /dt)t, 
this means that the order of magnitude of dx /dt is proportional to 1/,/t. Hence the order 
of magnitude of Ax? = (dx /dt)?t? is not t? but t: one should therefore keep this term in 
the expansion of Af to order t. 

The crucial point of Ito’s differential calculus is that if the stochastic process is a 
continuous-time Gaussian process, then for any small but finite time scale t, Ax is already 
the result of an infinite sum of elementary increments. Therefore, one can rewrite Eq. 
(4.125), choosing as a new elementary time step t' < t, and sum all these t/t’ equations 
to obtain Af on the scale t. Using the fact that for small t, Af {ax and A? f /ax* do not 
vary much, one finds: 


Ax= SU Axi Ax? = Ax? (4.126) 


33 The following section is obviously not intended to be rigorous. 


4.5 Does option price depend on the mean return? 177 


Using again the CLT between scales t’ & t and t, one finds that Ax is a Gaussian 
variable of RMS x ./t. On the other hand. since Ax? is the sum of positive variables, it 
is equal to its mean o/?t/t' plus terms of order Vt/t', where ef is the variance of Ax’, 
For consistency, o{° must be of order t'; we will thus set o,” = Dr’. 

Hence. in the limit t' — 0. with t fixed. Ax* in Eq. (4.125) becomes a non-random 


- is . . . 
variable equal to Dt, up to corrections of order ,/'t'/t. Now, taking the limit t > 0, one 
finally finds: 


on Sf Af _ af (xt) , af (wt) dx D#® f(t) 
fe a ix oo 2 ae 


where lim,..9 Ax/t = dx/dt. Equation (4.127) means that in the double limit t > 0, 

t/t 30: 

e The second-order derivative term does not fluctuate, and thus does not depend on the 
Specific realization of the random process. This is obviously at the heart of the possibility 
of finding a riskless hedge in this case.*4 

e Higher-order derivatives are negligible in the limit t = 0. 

e Equation (4.127) remains valid in the presence of a non-zero bias m. 


(4.127) 


Let us now apply the formula, Eq. (4.127) to the difference d f/dr between dC /dt 
and ¢dx/dz. This represents the difference in the variation of the value of the 
portfolio of the buyer of an option, which is worth C, and that of the writer of the 
option who holds $(x, t) underlying assets. One finds: 


df _ dx OC(x, xs, T — t) Zi C(x. xs, T — 1) dx 
dt) dt ar ax di 
DSC(x, xT —1) 
Seen 4.128 
t 2 ax? ¢ ) 


One thus immediately sees that if ¢ = ¢* = 9C(x, x,, T —1)/dx, the coefficient 
of the only random term in the above expression, namely dx /df. vanishes. The 
evolution of the difference of value between the two portfolios is then known with 
certainty! In this case, no party would agree on the contract unless this difference 
remains fixed in time (we assume here, as above, that the interest rate is zero). In 
other words, df /dt = 0, leading to-a partial differential equation for the price C: 


= 2 ~ = 
Se TAO DC eT) (4.129) 
ar 2 Ox? 


with a ‘final’ boundary condition: C(x, x;, 0) = max(x — x,, 0), i.e. the value of the 
option at expiry, see Eq. (4.48). The solution of this equation is the above result, 
Eq. (4.43), obtained for r = 0, and the Black-Scholes strategy is obtained by 
taking the derivative of the price with respect to x, since this is the condition under 
which dx /dr completely disappears from the game. Note also that the fact that the 


4 Iris precisely for the same reason that the risk is also zero for a binomial process, where 6x, can only take 
two values, see Appendix E. 
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average return m is Zero or non-zero does not appear in the aboye calculation, The 
price and the hedging strategy are therefore completely independent of the average 
return in this framework, a result obtained after rather cumbersome considerations 
above. 


4.5.3 Conclusion. Is the price of an option unique? 


Summarizing the above section, the Gaussian, continuous-time limit allows one 
to use very simple differential calculus rules, which only differ from the standard 
one through the appearance of a second-order non-fluctuating term —the so-called 
‘Ito correction’. The use of this calculus rule immediately leads to the two main 
results of Black and Scholes, namely: the existence of a riskless hedging strategy, 
and the fact that the value the average trend disappears from the final expressions. 
These two results are however not valid as soon as the hypothesis underlying 
Ito’s stochastic calculus are violated (continuous-time, Gaussian statistics). The 
approach based on the global wealth balance, presented in the above sections, is 
by far less elegant but more general. It allows one to understand the very peculiar 
nature of the limit considered by Black and Scholes. 

As we have already discussed, the existence of a non-zero residual risk (and 
more precisely of a negatively skewed distribution of the optimized wealth balance) 
necessarily means that the bid and ask prices of an option will be different, because 
the market makers will try to compensate for part of this risk. On the other hand, 
if the average return m is not zero, the fair price of the option explicitly depends 
on the optimal strategy @*, and thus of the chosen measure of risk (as was the case 
for portfolios, the optimal strategy corresponding to a minimal variance of the final 
result is different from the one corresponding to a minimum value-at-risk). The 
price therefore depends on the operator, of his definition of risk and of his ability 
to hedge this risk. In the Black-Scholes model, the price is uniquely determined 
since all definitions of risk are equivalent (and are all zero!). This property is often 
presented as a major advantage of the Gaussian model. Nevertheless, it is clear 
that it is precisely the existence of an ambiguity on the price that justifies the very 
existence of option markets!*> A market can only exist if some uncertainty Temains.: 
In this respect. it is interesting to note that new markets continually open, where 
more and more sources of uncertainty become tradable. Option markets correspond 
to a risk transfer: buying or selling a call are not identical operations (recall the 
skew in the final wealth distribution), except in the Black-Scholes world where 


i, er i re ; . F ; 
This ambiguity is related to the residual tisk, which. as discussed above, comes both from the presence of 


ies ae and from the very uncertainty on the parameters describing the distribution of price changes 
(‘volatility risk’). 
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options would actually be useless, since they would be equivalent to holding a 
certain number of the underlying asset (given by the A). 


4.6 Conclusion of the chapter: the pitfalls of zero-risk 


The traditional approach to derivative pricing is to find an ideal hedging strategy, 
which perfectly duplicates the derivative contract. Its price is then, using an 
arbitrage argument, equal to that of the hedging strategy, and the residual risk 
is zero. This argument appears as such in nearly all the available books on 
derivatives and on the Black-Scholes model. For example, the last chapter of 
[Hull], called “Review of Key Concepts’, starts by the following sentence: The 
pricing of derivatives involves the construction of riskless hedges from traded 
securities. Although there is a rather wide consensus on this point of view, we 
feel that it is unsatisfactory to base a whole theory on exceptional situations: as 
explained above, both the continuous-time Gaussian model and the binomial model 
are very special models indeed. We think that it is more appropriate to start from 
the ingredient which allow the derivative markets to exist in the first place, namely 
risk. In this respect, it is interesting to compare the above quote from Hull to the 
following disclaimer, found on most Chicago Board Options Exchange documents: 
Option trading involves risk! 

The idea that zero risk is the exception rather than the rule is important for a 
better pedagogy of financial risks in general; an adequate estimate of the residual 
risk — inherent to the trading of derivatives — has actually become one of the major 
concern of risk management (see also Sections 5.2, 5.3). The idea that the risk is 
zero is inadequate because zero cannot be a good approximation of anything. It 
furthermore provides a feeling of apparent security which can prove disastrous on 
some occasions. For example, the Black-Scholes strategy allows one, in principle, 
to hold an insurance against the fall of one’s portfolio without buying a true Put 
option, but rather by following the associated hedging strategy. This is called an 
‘insurance portfolio’, and was much used in the 1980s, when faith in the Black— 
Scholes model was at its highest. The idea is simply to sell a certain fraction of the 
portfolio when the market goes down. This fraction is fixed by the Black-Scholes 


-A of the virtual option, where the strike price is the security level below which the 


investor does not want to plummet. During the 1987 crash, this strategy has been 
particularly inefficient: not only because crash situations are the most extremely 
non-Gaussian events that one can imagine (and thus the zero-risk idea is totally 
absurd), but also because this strategy feeds back onto the market to make it crash 
further (a drop of the market mechanically triggers further sell orders). According 
to the Brady commission, this mechanism has indeed significantly contributed to 
enhance the amplitude of the crash (see the discussion in [Hull]). 
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4.7 Appendix D: computation of the conditional mean 


On many occasions in this chapter, we have needed to compute the mean value 
of the instantaneous increment dx,, restricted on trajectories starting from x, and 
ending at xy. We assume that the 5x,’s are identically distributed, up to a scale 
factor y;. In other words: 


é 
Pig (bxy) = —Pio( aE (4.130) 
Vx 


The quantity we wish to compute is then: 


P(tw, Nixz, K)(8X¢) cy.) (0y.N) = (4.131) 


N-1 N-1 
[ox a(n Re a) I] Pro( =! Js ’ 
j=k j=k Yj Yj 
where the 5 function insures that the sum of increments is indeed equal to xy — 


x,. Using the Fourier representation of the 6 function, the right-hand side of this 
equation reads: 


1 ‘ N-1 
—| 2 ia Wiyg Plo (Ye2) [| Pro(zy;) dz. (4.132) 
jak+l 
e In the case where all the y,’s are equal to yo, one recognizes: 
| iz(xy —xp) i a N-k . 
sg fe Bote te (4.133) 


Integrating by parts and using the fact that: 


I ioGaiateds 1 
P(xw, N|xy.k) = = fe “[Pio(zyo))* * dz, (4.134) 


one finally obtains the expected result: 

XN — XE 
N-—k~ 

e In the case where the y;’s are different from one another, one can write the result 


as a cumulant expansion, using the cumulants c,,; of the distribution Pi . After 
a simple computation, one finds: 


(6Xk) GUN.) = (4.135) 


P(xy, NX, men A) (xn ND = 


i Ye)" Cnt a"~ . 
caren (n = 1)! ax} - 1 PRE N|xx, k), (4.136) 


n=2 


which allows one to generalize the optimal strategy in the case where the 
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volatility is time dependent. In the Gaussian case, all the c, for > 3 are zero. 
and the last expression boils down to: 


te (tw — Xe) 
(Xx) trp. ew) = 2 a ee (4.137) 


ee 


This expression can be used to show that the optimal strategy is indeed 
the Black-Scholes A, for arbitrary Gaussian increments in the presence of a 
non-zero interest rate. Suppose that 


Xkpt — Xe = Pq + OX:, (4.138) 


where the 5x, are identically distributed. One can then write xy as: 


N-I 
xy =x9(1 +p)” + > dxe(1 + py *". (4.139) 
k=0 


The above formula, Eq. (4.137), then reads: 
Ye rw — xx(1 + p)*~*) 


2 (4.140) 
Beas ¥; 


(8Xk) (4. Ow. N) = 


with yz, = (1 + p)X~*-!, 


4.8 Appendix E: binomial model 


The binomial model for price evolution is due to Cox, Ross and Rubinstein, and 
shares with the continuous-time Gaussian model the zero risk property. This model 
is very much used int practice [Hull], due to its easy numerical implementation. 
Furthermore, the zero risk property appears in the clearest fashion. Suppose indeed 
that between t, = kr and t,41, the price difference can only take two values: 
5x; = 6x). For this very reason, the option value can only evolve along two paths 
only, to be worth (at time &%4.1)- Cos Consider now the hedging strategy where 
one holds a fraction @ of the underlying asset, and a quantity B of bonds, with a 
risk-free interest rate p. If one chooses ¢ and B such that: 


du (xy + 8x1) + Bel + p) = CH", (4.141) 
hu (Xz + 5x2) + Bel + p) = CP. (4.142) 

or else: 
ckt+! _ pkti 8: ck casi oCkt! 
{i232 Kile =. (4.143) 


bx, — 8x2 bx; — 6x2 
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one sees that in the two possible cases (Sx, = Sx,,2), the value of the hedging 
portfolio is strictly equal to the option value. The option value at time i Is thus 
equal to CX(xq) = exe + Be, independently of the probability p {resp. 1 — pP] 
that 5x, = dx, [resp. 6x2]. One then determines the option value by iterating this 
procedure from time kK +1 = N, where C N is known, and equal to max(Xy —Xs. 0). 
It is however easy to see that as soon as dx, can take three or more values, it is 
impossible to find a perfect strategy.°° 

The Ito process can be obtained as the continuum limit of the binomial tree. 
But even in its discrete form, the binomial model shares some important properties 
with the Black-Scholes model. The independence of the premium on the transition 
probabilities is the analogue of the independence of the premium on the excess 
return m in the Black-Scholes model. The magic of zero risk in the binomial model 
can therefore be understood as follows. Consider the quantity gs? = (6x, - (8x4); 
s? is in principle random, but since changing the probabilities does not modify the 
option price one can pick p = 4, making s* non-fluctuating (s? = (6x; — 6x2)"/4). 
The Ito process shares this property: in the continuum limit quadratic quantities do 
not fluctuate. For example, the quantity 


Tyr t 
= lim 2; (x((k + It) — x(kt) — mt)’. (4.144) 
k=0 


is equal to DT with probability one when x(t) follows an Ito process. In a sense, 
continuous-time Brownian motion represents a very weak form of randomness 
since quantities such as 52 can be known with certainty. But it is precisely this 
property that allows for zero risk in the Black-Scholes world. 


4.9 Appendix F: option price for (suboptimal) A-hedging 


If @¢ = A, the ‘risk neutral’ probability Q(x, N|xo.0) is simply equal to 
Po(x. N\xo, 0), for N large, beyond the first order in m, as we show now. Taking 


¢ = 9C/Ax leads to an implicit equation for @ aa a 


C(xo, Xs. N) = f V(x’ a Xs) Pri (x', N|xo, 0) dx' (4.145) 


N-1 
aC (x, xs. N —k) 
He pam | en RNS dx, 


36 For a recent analysis along the lines of the present book, sce E. Aurell, S. Simdyankin, Pricing Risky Options 
Simply, /nternational Journal of Theoretical and Applied Finance. 1. | (1998). See also M. Schweizer, Risky 
options simplified, /nternational Journal of Theoretical and Applied Finance, 2, 59 (1999). 
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with )' representing the pay-off of the option. This equation can be solved by 
making the following ansatz for CG: 


C(x. x5. N—k) = | Rx’ — x, N — bk) Pin (x's NIX. k) dx’ (4.146) 
where 2 is an unknown kernel which we try to determine. Using the fact that the 
option price only depends on the price difference between the strike price and the 


present price of the underlying, this gives: 
| Q(x! — Xs, N) Pm (x, NIX0. N) dx’ = (4.147) 
N-1 
foe — x5) P(x’, N xo, 0) dx’ +m | Pin (x, K|X0, 0) 
k=0 
<< | Q(x’ —x,,N - k) Pin (x’, N\x,k) dx’ dx. 
Xs 
Now, using the Chapman—Kolmogorov equation: 
/ Pym (x, Nx, k) Pm (%, Klx0, 0) dx = Pm(x’, N\x0, 9), (4.148) 


one obtains the following equation for $2 (after changing k > N —k): 


N t 
AQ(x' —x5,k 
Q(x' — xs, N) +m > a 


k=1 


= V(x’ — x5). (4.149) 


The solution to this equation is Q(x’ — Xs, k) = V(x! — xs — mk). Indeed, if this 
is the case, one has: 


eae Rhee k 
82(x' — xs, k) = — 1 ake x5, k) (4.150) 
ox’ my ok 
and therefore: 
N 5 ' 
a OyV(x — x. —my,k) , 
V(x =ay—mil)— De — Xs) (4.151) 


where the last equality holds in the small t, large N limit, when the sum over k can 
be approximated by an integral.*’ Therefore, the price of the option is given by: 


C= foe — x, —mN)Pm(x’, N1Xx0, 9) dx’. (4.152) 


. . 5 ‘| 
37 The resulting error is of the order of m>1/D. 
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Now. using the fact that P,,(v'. N|xp.0) = Pata’ — my. N{xo, 0). and changing 
variable from x’ — x’ — m,N. one finally finds: 


C= [> — Xz) Po(x', N|xo, 0) dx’. (4.153) 


thereby proving that the pricing kernel Q is equal to Pp if the chosen hedge is 
the A. Note that, interestingly, this is true independently of the particular pay-off 
function : Q thus has a rather general status. 
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Options: some more specific problems 


This chapter can be skipped at first reading. 
(J.-P. Bouchaud, M. Potters, Theory of Financial Risks.) 


5.1 Other elements of the balance sheet 


We have until now considered the simplest possible option problem, and tried 
to extract the fundamental ideas associated to its pricing and hedging. In reality, 
several complications appear, either in the very definition of the option contract (see 
next section), or in the elements that must be included in the wealth balance — for 
example dividends or transaction costs — that we have neglected up to now. 


5.1.1 Interest rate and continuous dividends 


The influence of interest rates (and continuous dividends) can be estimated using 
different models for the statistics of price increments. These models give slightly 
different answers; however, in a first approximation, they provide the following 
answer for the option price: , 


C(xo, Xs. T,r) =e C(xpe’, x;.T, r = 0), (5.1) 


where C(xo, x;, 7, r = 0) is simply given by the average of the pay-off over the 
terminal price mis srbation: In the presence of a non-zero continuous “dividend d> 
the quantity xge’? should be replaced by xoe"~“'”, i.e. in all cases the present 
price of the underlying should be replaced by the corresponding forward price — see 
Eq. (4.23). 

Let us present three different models leading to the above approximate formula; 
but with slightly different corrections to it in general. These three models offer 
alternative ways to understand the influence of a non-zero interest rate (and/or 
dividend) on the price of the option. 
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Systematic drift of the price 


In this case, one assumes that the price evolves according to x»4); — x, = (p — 
5)x, + dx;, where 5 is the dividend over a unit time period r: 6 = dt and dx; are 
independent random variables. Note that this ‘dividend’ is, in the case of currency 
options, the interest rate corresponding to the underlying currency (and p is the 
interest rate of the reference currency). 

This model is convenient because if (Sx;) is zero, the average cost of the hedging 
strategy is zero, since it includes the terms x441 — x, — (pe — 6)xz. However, the 
terminal distribution of x,y must be constructed by noting that: 


Nt 


xy = Xo(1 +p —8)* + D> dx +p — 8) (5.2) 
” k=0 


Thus the terminal distribution is, for large N, a function of the difference x(7) — 
xoe’-@7, Furthermore, even if the Sx, are independent random variables of 
variance Dr, the variance c2(7') of the terminal distribution is given by: 


My 
c2(T) = a eel = 1}, (5.3) 


which is equal to c.(T) = DT(1 + (r — d)T) when (r — d)T — 0. There is 
thus an increase of the effective volatility to be used in the option pricing formula 
within this model. This increase depends on the maturity; its relative value is, for 
T =1 year, r —d = 5%, equal to (r — d)T/2 = 2.5%. (Note indeed that c2(T) 
is the square of the volatility). Up to this small change of volatility, the above rule, 
Eq. (5.1) is therefore appropriate. 


Independence between price increments and interest rates—dividends 


The above model is perhaps not very realistic since it assumes a direct relation 
between price changes and interest rates—dividends. It might be more appropriate 
to consider the case where x,4.; — x, = 5xX,, as the other extreme case; reality is 
presumably in between these two limiting models, Now the terminal distribution is 
a function of the difference xy — x9. with no correction to the variance brought 
about by interest rates. However, in the present case, the cost of the hedging 
strategy cannot be neglected and reads: 


N-1 
(AWs) = —(o — 8) ) > (xbf’* (x). (5.4) 


k=0 
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Writing x, = x9 + > ore dx,, we find that (A Ws) is the sum of a term proportional 
to xp and the remainder. The first contribution therefore reads: 


(AWs), = ~~" f P(x.k\xo. Opi *(x) dx. (5.5) 


This term thus has exactly the same shape as in the case of a non-zero average 
return treated above, Eq. (4.105), with an effective mean return mje = —(p — 
5)xq. If we choose for @* the suboptimal A-hedge (which we know does only 
induce minor errors), then, as shown above, this hedging cost can be reabsorbed in 
a change of the terminal distribution, where xy — x9 becomes xy — Xo — MmyegN, or 
else, to first order: xy — x9 exp[(p — 5) N]. To order (9 — 5)N, this therefore again 
corresponds to replacing the current price of the underlying by its forward price. 
For the second type of term, we write, for j < k: 


(8x jf" (Xx) = | Pex, jx. 0) f Ponts). jz raul *(xk) de. 


(5.6) 
This type of term is not easy to calculate in the general case. Assuming that the 
process is Gaussian allows us to use the identity: 


tk — Xj OP (xx, k|x;, J) 
P (xp, k(x;, = —Di——_———;; : 
(es klxp. on T Ox (5.7) 
one can therefore integrate over x; to find a result independent of /: 
Ne a Nx 1 
(Sx j Py (Xe) = ae P (xXx. kK xX0, O) Gy * (XK) Axe, (5.8) 
0 


or, using the expression of ¢,'*(x,) in the Gaussian case: 
he a 7 ; . 
(8x Op *(xx)) = pr | P(x’. N|xo, 0) dx’ = Dr P(x, Nixo,0). (5.9) 
0 Jx, 


The contribution to the cost of the strategy is then obtained by summing over k and 
over j < k, finally leading to: 


N 
(AWs)2 = a & — 6) Dr P(xs, N\xo, 0). - (5.10) 


This extra cost is maximum for at-the-money options, and leads to a relative 
increase of the call price given, for Gaussian statistics, by: 

| T 

7 Agha se—ay (3.11) 
which is thus quite small for short maturities. (For r = 5% annual, d = 0, and 
T = 100 days, one finds 6C/C < 1%.) 
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Multiplicative model 


Let us finally assume that x,4.; —x_ = (9 —d-+7, )xy. Where the y,’s are independent 
random variables of zero mean. Therefore, the average cost of the strategy is zero. 
Furthermore, if the elementary time step t is small. the terminal value xy) can be 
written as: 


-1 N-1 
toe (= )- Y tog(+ 0 ~ Sin) =~N(p-d+y: y= dom, (5.12) 
0 =0 k=0 
thus xy = xe"? +”. Introducing the distribution P(y), the price of the option 
can be written as: 


C(x, xs, T, r,d) = a | P(y) max(xpe"” 7 *” — x,, 0) dy, (5.13) 


which is obviously of the general form, Eq. (5.1). 
We thus conclude that the simple rule, Eq. (5.1), above is, for many purposes, 
sufficient to account for interest rates and (continuous) dividends effects. A more 


accurate formula, including corrections of order (r — d)T depends somewhat on 


the chosen model. 


5.1.2 Interest rates corrections to the hedging strategy 


It is also important to estimate the interest rate corrections to the optimal hedging strategy. 
Here again, one could consider the above models, that is, the systematic drift model, 
the independent model or the multiplicative model. In the former case, the result for a 
general terminal price distribution is rather involved, mainly due to the appearance of the 
conditional average detailed in Appendix D, Eq. (4.140). In the case where this distribution 
is Gaussian, the result is simply Black-Scholes’ A hedge, i.e. the optimal strategy is the 
derivative of the option price with respect ta the price of the underlying contract (see Eq. 
(4.78)). As discussed above, this simple rule leads in general to a suboptimal strategy, but 
the relative increase of risk is rather small. The A-hedge procedure, on the other hand, has 
the advantage that the price is independent of the average return (see Appendix F). 

In the ‘independent’ model, where the price change is unrelated to the interest rate and/or 
dividend, the order p Correction to the optimal strategy is easier to estimate in general. 
From the general formula, Eq. (4.118), ane finds: 


= a ote: xs. —k) 


oi(x) = aipidge - 
xx aes RPG AP CREO) egy a. 5 
oo dx 
¥ P(x, k\xo.0) Ae 
- ; een (5.14) 


where f° is the optimal strategy for r = d = 0. 
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Finally, for the nudtiplicative model, the optimal strategy is given by a pach simpler 
expression: 


al + py! tk N 
“yy = fe 5.15) 
) xa?(N —k) 


oo a 
a — x5 log | ———_——-——— } Pix’, N)xi idk" 
«| oe (aa) " 


5.1.3 Discrete dividends 


More generally, for an arbitrary dividend d, (per share) at time f;, the extra term in 
the wealth balance reads: 


N 
AWp = > of (xvddk. (5.16) 

k=l 
Very often, this dividend occurs once a year, at a given date ko: dy = dodx 49. 
In this case, the corresponding share price decreases immediately by the same 
amount (since the value of the company is decreased by an amount equal to dy 
times the number of outstanding shares): x —+ x — dy. Therefore, the net balance 
d, + dx, associated to the dividend is zero. For the same reason, the probability 
Pa(X, N xo, 0) is given, for N > ko, by Pa,=o(x + do, N|xo, 0). The option price 
is then equal to: Cy, (x, x,, N) = C(x, x, + do, N). If the dividend dg is not known 
in advance with certainty, this last equation should be averaged over a distribution 
P(do) describing as well as possible the probable values of dy. A possibility is 
to choose the distribution of least information (maximum entropy) such that the 
average dividend is fixed to d, which in this case reads: 


1 _ 
P(dy) = sere dopi. (5.17) 


5.1.4 Transaction costs 


The problem of transaction costs is important: the rebalancing of the optimal hedge 
as time passes induces extra costs that must be included in the wealth balance as 


well. These ‘costs’ are of different nature —some are proportional to the numbef of - 


traded shares, whereas another part is fixed, independent of the total amount of the 
operation. Let us examine the first situation, assuming that the rebalancing of the 
hedge takes place at every time step t. The corresponding cost in then equal to: 


5 Werk = Vidp., — Pf, (5.18) 


where v is a measure of the transaction costs. In order to keep the discussion simple, 
we shall assume that: 


me: 
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e the most important part in the variation of ¢; is due to the change of price of the 
underlying. and not from the explicit time dependence of ¢j (this is justified in 
the limit where t < T); 

e 5x, 1s small enough such that a Taylor expansion of ¢* is acceptable: 

* the kurtosis effects on ¢* are neglected, which means that the simple Black- 
Scholes A hedge is acceptable and does not lead to large hedging errors. 


One can therefore write that: 


aWep a “Pi(9) = v[dx;| (5.19) 


bX, =a Ot) (~) 
x Ox 


since 0; (x)/Ax is positive. The average total cost associated to rehedging is then, 
after a straightforward calculation:! 


N-1 


(AWy) = (5 sa = v(|dx|)N P(x,, N|xo, 0). (5.20) 
k=0 


The order of magnitude of (|5x|) is given by o;x9: for an at-the-money option, 


P(x, N\xo,0) ~ (oyxoVN)~'; hence, finally, (AW,) ~ vJ/N. It is natural to 


compare (AW,,) to the option price itself, which is of order C ~ o,x)/N: 


AW, 
7 x ar (5.21) 


This part of the transaction costs is in general proportional to xo: taking for example 
v = 10-*xo, t = 1 day, and a daily volatility of o; = 1%, one finds that the 
transaction costs represent 1% of the option price. On the other hand, for higher 


-transaction costs (say v = 107*x9), a daily rehedging becomes absurd, since the 


ratio, Eq. (5.21), is of drder 1. The rehedging frequency should then be lowered, 
such that the volatility on the scale of r increases, to become smaller than v. 

The fixed part of the transaction costs is easier to discuss. If these costs are equal 
to v’ per transaction. and if the hedging strategy is rebalanced every T, the total 
cost incurred is simply given by: 


AW, = Nv. (5.22) 


Comparing the two types of costs leads to: 


AW, ~ v'./N 

(AW,) vy - 
showing that the latter costs can exceed the former when N = T’/t is large, i.e. 
when the hedging frequency is high. 


(5.23) 


' One should add to the following formula the cost associated with the initial value of the hedge $5- 
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In summary, the transaction costs are higher when the trading time is smaller. 
On the other hand, decreasing of t allows one to diminish the residual risk (Fig. 
4.11). This analysis suggests that the optimal trading frequency should be chosen 
such that the transaction costs are comparable to the residual risk. 


5.2 Other types of options: ‘Puts’ and ‘exotic options’ 

5.2.1 ‘Put-call’ parity 
A ‘Put contract is a sell option, defined by a pay-off at maturity given by max(x, — 
x, 0). A put protects its owner against the risk that the shares he owns drops below 
the strike price x,. Puts on stock indices like the S&P 500 are very popular. The 
price of a European put will be noted C*[xo, x,, T] (we reserve the notation P 
for a probability). This price can be obtained using a very simple ‘parity’ (or no 
arbitrage) argument. Suppose that one simultaneously buys a call with strike price 
Xs, and a put with the same maturity and strike price. At expiry, this combination is 
therefore worth: 


max(x(T) — xs.0) — max(x, — x(T), 0) = x(T) — xs. (5.24) 


Exactly the same pay-off can be achieved by buying the underlying now, and selling 
a bond paying x, at maturity. Therefore, the above call+put combination must be 
worth: 


Clxo, Xs, T] aa Gi [xo, Xs, T] =X xen. (5.25) 


which allows one to express the price of a put knowing that of a call. If interest rate 
effects are small (r7 < 1), this relation reads: 


Clx5,.%, 7) 2 Clty, Xs TI 4% — Xo: (5.26) 


Note in particular that at-the-money (x, =: %9), the two contracts have the same 
value, which is obvious by symmetry (again, in the absence of interest rate effects). 


é 


5.2.2 ‘Digital’ options Ss ££ 2 api 


More general option contracts can stipulate that the pay-off is not the difference 
between the value of the underlying at maturity xy and the strike price x,, but rather 
an arbitrary function Y(x,) of xy. For example, a ‘digital’ option is a pure bet, in 
the sense that it pays a fixed premium whenever xy exceeds x,. Therefore: 


Yixv) = Vo (xy > 5); V(xn) =0 (an < x,)- (5.27) 


The price of the option in this case can be obtained following the same lines as 
above. In particular, in the absence of bias (i.e. for m = 0) the fair price is given 
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Cy(xo, N) = (Vay) = i V(x) Pox. N xo. 0) dx, j 


whereas the optimal strategy is still given by the general formula, Eq. (4), | 
particular, for Gaussian fluctuations, one always finds the Black-Scholes ry | 4 
aCy[x, N] ‘ya 
ax , ; | ; 4 

The case of a non-zero average return can be treated as above, in Section 4.5.) | | 
order in m, the price of the option is given by: 


P}(x) = 


i} 
fF 
{ 

mT = aol . Si 
Cm memo MES A Fe melo 


vty | 


which reveals, again, that in the Gaussian case, the average return disappears in i 
final formula, In the presence of non-zero kurtosis, however, a (small) systematic cc" \\ 
to the fair price appears. Note that Cy m can be written as an average of Y(x) u | ' 
effective, ‘risk neutral’ probability Q(x) introduced in Section 4.5.1. If the A- 4 
used, this risk neutral probability is simply Po (see Appendix F). 


5.2.3 ‘Asian’ options 

The problem is slightly more complicated in the case of the so-called | 
options. The pay-off of these options is calculated not on the ays 
underlying stock at maturity, but on a certain average of this value over a’, 
number of days preceeding maturity. This procedure is used to prevent an 4. 
rise of the stock price precisely on the expiry date, a rise that could be tr, | 
by an operator having an important long position on the corresponding opti! a 
contract is thus constructed on a fictitious asset, the price of which being 
as: 


N 
= ) Wi Xj, 
k=0 


where the {w;}’s are some weights, normalized such that er, we = |b 
define the averaging procedure. The simplest case corresponds to: 


Wy = Wy-) = +--+ = Wy-Ks1 = > we=0 (k<N-K+)): 


where the average is taken over the last K days of the option life: One’ 
however consider more complicated situations, for example an exponentia . 
(w, o s‘~*), The wealth balance then contains the modified pay- -off: mé 
xs, 0). or more generally V(X). The first problem therefore concerns the stati 
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*. As we shall see, this problem is very similar to the case encountered in Chapter 
4 where the volatility is time dependent. Indeed, one has: 


N N i-] N-I 
é WX, = », w, (S bx— + 2) =xXpt+ s VEOXE. (3,33) 


k= k=0 é=0 k= 
where 
N 
Ve = D> wi. (5.34) 
ixk+1 


Said differently, everything goes as if the price did not vary by an amount dx,, but 
by an amount dy, = y,dx,, distributed as: 


1p, (=). (5.35) 


In the case of Gaussian fluctuations of variance Dt, one thus finds: 


] (a 2 
P(&, N\xo, 0) = ———— exp |-S |. (5.36) 
V2nDNt 2DNt 
where 
_ pz 
D=—) ¥. (5.37) 
N i= 
More generally, P(x, N\xo, 0) is the Fourier transform of 
N-1 | 
[] Acme. (5.38) 
k=0 


This information is sufficient to fix the option price (in the limit where the average 
return is very small) through: 


CasilXovt,. NJ = / (x ~ x.) P(x, Nx. 0) d¥. (5.39) 


In order to fix the optimal strategy, one must however calculate the following 
quantity: 


« 


bs 
P(X, N(x, k)(8x¢) loys ny (5.40) 


conditioned to a certain terminal value for x (cf. Eq. (4.74)). The general calcula- 
tion is given in Appendix D. For a small kurtosis, the optimal strategy reads: 
OCasiLx,xs,N —k] | KyDt_ 1 d°Casi[x.x5,N —k] 


Nef 
i ax we ax3 


> The case of a multiplicative process is more involved: see, e.g. H. Gemam, M. Yor, Bessel processes, Asian 
Options and Perpetuities, Mathematical Finance, 3, 349 (1993). 


(5.41) 
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Note that if the instant of time ‘*k’ is outside the averaging period, one has y. = | 
(since )7;., w; = 1), and the formula, Eq. (4.80), is recovered. If on the contrary 
k gets closer to maturity, y, diminishes as does the correction term. 


5.2.4 ‘American’ options 


We have up to now focused our attention on ‘European’-type options, which can 
only be exercised on the day of expiry. In reality, most traded options on organized 
markets can be exercised at any time between the emission date and the expiry 
date: by definition, these are called ‘American’ options. It is obvious that the price 
of American options must greater or equal to the price of a European option with 
the same maturity and strike price, since the contract is a priori more favourable 
to,the buyer. The pricing problem is therefore more difficult, since the writer of the 
option must first determine the optimal strategy that the buyer can follow in order 
to fix a fair price. Now, in the absence of dividends, the optimal strategy for the 
buyer of a call option is to keep it until the expiry date, thereby converting de facto 
the option into a European option. Intuitively, this is due to the fact that the average 
(max(xy ~ x;,0)) grows with N, hence the average pay-off is higher if one waits 
longer. The argument can be more convincing as follows. Let us define a ‘two-shot’ 
option, of strike x,, which can only be exercised at times N; and Nz > N, only. 
At time ), the buyer of the option may choose to exercise a fraction f (x,) of the 
option, which in principle depends on the current price of the underlying x;. The 
remaining part of the option can then be exercised at time N2. What is the average 
profit (G) of the buyer at time N2? 
Considering the two possible cases, one obtains: 


+00 
(G) = / (] —x de, [ P(x, Nolar, NLL — FO) IPO. Milxo, 0) dx, 


+00 
+f f(x) — xe)e7™ 2) P(x), Ni |x, 0) dxt. (5.42) 

which can be meatier as: 
(G) = Chxo.xs, Noe + / F(x) (5.43) 


x P(x, Ni (x0, 0) (x; — x5 — Cl. x5, No — Ni) 72-8 dx. 


The last expression means that if the buyer exercises a fraction f(x,) of his option, 
he pockets immediately the difference x; — x,, but loses de facto his option, which 
is worth C[x;, x,, N2 — N)]. 


3 Options that can be exercised at certain specific dates (more than one) are called “Bermudan’ options. 
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The optimal strategy, such that (G) is maximum, therefore consists in choosing 
f(x) equal to 0 or 1. according to the sign of x; — x, — CLx;, x. N2 — Ni]. Now, 
this difference is always negative, whatever +; and Ny — Nj. This is due to the 
put—call parity relation (cf. Eq. (5.26)): 


City. xs. N2—M) = Chi, xs, Na NJ — 0 — a) (Le NY), (5.44) 


Since C' > 0, C[x,. x.. N2 — Ni] — (4; — x.) is also greater or equal to zero. 

The optimal value of f(x,) is thus zero; said differently the buyer should wait 
until maturity to exercise his option to maximize his average profit. This argument 
can be generalized to the case where the option can be exercised at any instant 
N,, N2,..., Na with n arbitrary. 

Note however that choosing a non-zero f increases the total probability of 
exercising the option, but reduces the average profit! More precisely, the total 
probability to reach x, before maturity is twice the probability to exercise the 
option at expiry (if the distribution of 5x is even, see Section 3.1.3). OTC American 
options are therefore favourable to the writer of the option, since some buyers might 
be tempted to exercise before expiry. 

It is interesting to generalize the problem and consider the case where the two strike 


prices Xs; and X52 are different at times N, and N2, in particular in the case where xs, < 
X52. The average profit, Eq. (5.43), is then equal ta (for r = 0): 


(G) = Chrowxsa: Na) + if F(x) PO, Nilxo, 0) 


X (x1 — Xs) — CLxy. X52, No — Ny ]) dx. (5.45) 


The equation 
x* = x51 — C[x*. x52, No — Ny] = 0 (5.46) 


then has a non-trivial solution, leading to f(x,) = \ for x, > x*. The average profit of 
the buyer therefore increases, in this case, upon an early exercise of the option. 


American puts 


Naively, the case of the American puts looks rather similar to that of the calls, and 
these should therefore also be equivalent to European puts. This is not the case 
for the following reason.’ Using the same argument as above, one finds that the 
average profit associated to a ‘two-shot’ put option with exercise dates N,, N2 is 
given by: 


(G1) = Cifxo.x., Nole™ + ff) PO. Nilxo. 0) 


“oS 


x (x; — xy —Ci[xy. x5, No 


— Ni) e™2-*0 dx, (5.47) 


The case of American calls with non-zero dividends is similar to the case discussed here. 
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Now, the difference (x, — x; — C*[x;, 4... N2 — NJ) can be transformed, using the 
put—call parity, as: 


xs[1 eNO) — Cixy, ty, Na Ni). (5.48) 
This quantity may become positive if C[x,,.x., V2 — Nj] is very small, which 
corresponds to x, >> x; (Puts deep in the money). The smaller the value of r, the 
larger should be the difference between x, and x,, and the smaller the probability 
for this to happen. If r = 0, the problem of American puts is identical to that of the 
calls. 

In the case where the quantity (5.48) becomes positive, an ‘excess’ average profit 
8G is generated, and represents the extra premium to be added to the price of the 
European put to account for the possibility of an early exercise. Let us finally note 
that the price of the American put C/.,, is necessarily always larger or equal to x, —x 
(since this would be the immediate profit), and that the price of the ‘two-shot’ put 
is a lower bound to Cf... 


The perturbative calculation of 5G (and thus of the ‘two-shot’ option) in the limit of small 
interest rates is not very difficult. As a function of Ni, 5G reaches a maximum between 
N2/2 and N2. For an at-the-money put such that Nz = 100, r = 5% annual, o = 1% 
per day and xy = xs = 100, the maximum is reached for N, ~ 80 and the corresponding 
5G ~ 0.15. This must be compared with the price of the European put, which is Chie, 
The possibility of an early exercise leads in this case ta a 5% increase of the price of the 
option. 

More generally, when the increments are independent and of average zero, one can 
obtain a numerical value for the price af an American put Cla by iterating backwards 
the following exact equation: 


Ch Lx.as, N + 1) = max (x —x,e77 / Pi (5x)Ci Lx + 5x, Xs. wydbx) . (5.49) 


This equation means that the put is worth the average value of tomorrow’s price if it is 
not exercised today (Cam > Xs — Xx), or Xs — x if it is immediately exercised. Using this 
procedure, we have calculated the price of a European, American and ‘two-shot’ option 
of maturity 100 days (Fig. 5.1). For the ‘twa-shot’ option, the optimal value of Ny as a 
function of the strikeis shawn in the inset, 


5.2.5 ‘Barrier’ options 

Let us now turn to anather family of options, called ‘barrier’ options, which are such that if 
the price of the underlying x, reaches a certain ‘barrier’ value xp during the lifetime of the 
option, the option is lost. (Conversely, there are options that are only activated if the value 
xp is reached.) This clause leads to cheaper options, which can be more attractive to the 
investor. Also, if xp, > Xs. the writer of the option limits his possible losses to xp — Xx. What 
is the probability Py (x. N|\xo, 0) for the final value of the underlying to be at x, conditioned 
to the fact that the price has not reached the barrier value xp? 
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Fig. 5.1. Price of a European, American and ‘two-shot’ put option as a function of the 
strike, for a 100-days maturity and a daily volatility of 1% and r = 1%. The top curve is 
the American price, while the bottom curve is the European price. In the inset is shown the 
optimal exercise time N as a function of the strike for the ‘two-shot’ option. 


In some cases, it is possible to give an exact answer to this question, using the so-called 
method of iniages. Let us suppose that far each time step, the price x can only change by 
an discrete amount, +1 tick. The method of images is explained graphically in Figure 5.2: 
one can notice that all the trajectories going through xp between k = 0 andk = N has a 
‘mirror’ trajectory, with a statistical weight precisely equal (for m = 0) to the one of the 
trajectory one wishes to exclude. It is clear that the canditional probability we are looking 
for is obtained by subtracting the weight of these image trajectories: 


Py (x, N|xo, 0) = P(x, N|xo, 0) — P(x, N|2xp — xo. 0). (5.50) 


In the general case where the variations of x are not limited to 0, + 1, the previous 


argument fails, as one can easily be convinced by considering the case where $x takes the, 


values +1 and +2. However, if the possible variations of the price during the time t are 
small, the error coming fram the uncertainty about the exact crossing time is small, and 
leads to an error on the price C; of the barrier option on the order of {|5x|) times the total 
probability of ever touching the barrier. Discarding this correction, the price of barrier 
options reads: ‘ 


oO 
Celxo.x5.NJ = / (x — x5)[P(x, N|xo, 0) — P(x, N|2x, — xo, 0)] dx 
Xs 


C[xo. xs, N) — C[2x, — x0, x5, N) (5.51) 


a ec a ects EASE ret * 
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Fig. 5.2. Illustration of the method of images. A trajectory, starting from the point x9 = —5 
and reaching the point x29 = —1 can either touch or avoid the "barrier’ located at x, = 0. 
For each trajectory touching the barrier, as the One shown in the figure (squares), there 
exists one single trajectory (circles) starting from the point x» = +5 and reaching the same 
final point—only the last section of the trajectory (after the last crossing point) is common 
to both trajectories. In the absence of bias, these two trajectories have exactly the same 
statistical weight. The probability of reaching the final point without crossing x, = 0 can 
thus be obtained by subtracting the weight of the image trajectories. Note that the whole 
argument is wrong if jump sizes are not constant (for example when 6x = +1 or +2). 


(xp < xq), or 
4 
Co[xo, xs, N] = [ (x — x5) [P(x. N|x9.0) — P(x, N[2xp — x0, 0)] dx, (5.52) 


(Xp > Xs); the option is worthless whenever xq < Xp < Xs. 

One can also find ‘double barrier’ options, such that the price is constrained ta remain 
within a certain channel Xp a a , or else the option vanishes. One can generalize 
the method of images to this case. The images are naw successive reflections of the starting 
point xq in the two parallel ‘mirrors’ x, .x;- 


Other types of option 


One can find many other types of option, which we shall not discuss further, 
Some options, for example, are calculated on the maximum value of the price 
of the underlying reached during a certain period. It is clear that in this case, a 
Gaussian or log-normal model is particularly inadequate, since the price of the 
option is governed by extreme events. Only an adequate treatment of the tails of 
the distribution can allow us to price this type of option correctly. 
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5.3 The ‘Greeks’ and risk control 


The ‘Greeks’, which is the traditional name given by professionals to the derivative 
of the price of an option with respect to the price of the underlying, the volatility, 
etc., are often used for local risk control purposes. Indeed, if one assumes that the 
underlying asset does not vary too much between two instants of time f and ¢ + Tt, 
one may expand the variation of the option price in Taylor series: 


1 
5C = Abx + 5 f(xy" + Vso + Or, (5.53) 


where dx is the change of price of the underlying. If the option is hedged by 
simultaneously selling a proportion ¢@ of the underlying asset, one finds that the 
change of the portfolio value is, to this order: 


| 
bW =(A—o)dx + 5 Foxy + Vso + Or, (5.54) 


Note that the Black-Scholes (or rather, Bachelier) equation is recovered by setting 
~* = A, doa = 0, and by recalling that for a continuous-time Gaussian process, 
(8x)? = Dr (see Section 4.5.2). In this case, the portfolio does not change with 
time (5W = 0), provided that © = —DI"/2, which is precisely Eq. (4.51) in the 
limit t > 0. 

In reality, due to the non-Gaussian nature of 5x, the large risk corresponds to 
cases where (8x)? >> |@r|. Assuming that one chooses to follow the A-hedge 
procedure (which is in general suboptimal, see Section 4.4.3 above), one finds that 
the fluctuations of the price of the underlying leads to an increase in the value of 
the portfolio of the buyer of the option (since [ > 0). Losses can only occur if the 
implied volatility of the underlying decreases. If 5x and 6 are uncorrelated (which 
is in general not true), one finds that the ‘instantaneous’ variance of the portfolio is 
given by: 


3 7 
DEAL 72 (572)? + V7 (50), (5.55) 


((6W)*) = 
where x; is the kurtosis of 6x. For an at-the-money option of maturity 7, one has: 


Fine Proxy F. (5.56) 


] 
Oxo JT 


Typical values are, on the scale of t = one day. «; = 3 and 6o ~ o. The 


contribution to risk is therefore on the order of ox9t//T. This is equal to the - 


typical fluctuations of the underlying contract multiplied by ./t/T, or else the 
price of the option reduced by a factor N = T/t. The Vega contribution is much 
larger for long maturities, since it is of order of the price of the option itself. 
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5.4 Value-at-risk for general non-linear portfolios (*) 


A very important issue for the control of risk of complex portfolios, which involves 
many non-linear assets, is to be able to estimate its value-at-risk reliably. This is 
a difficult problem, since both the non-Gaussian nature of the fluctuations of the 
underlying assets and the non-linearities of the price of the derivatives must be dealt 
with. A solution, which is very costly in terms of computation time and not very 
precise, is the use of Monte-Carlo simulations. We shall show in this section that 
in the case where the fluctuations of the ‘explicative variables’ are strong (a more 
precise statement will be made below), an approximate formula can be obtained 
for the value-at-risk of a general non-linear portfolio. 

Let us assume that the variations of the value of the portfolio can be written 
as a function df (e;, é2,...,é) of a set of M independent random variables e,, 
a = 1,...,M, such that (e,) = 0 and (e,e,) = 80,002. The sensitivity of the 
portfolio to these ‘explicative variables’ can be measured as the derivatives of the 
value of the portfolio with respect to the eg. We shall therefore introduce the A’s 
and J's as: 

af a* f 


= ,b = . 
‘i “ de,0e, 


@57) 


We are interested in the probability for a large fluctuation 5f* of the portfolio. 
We will surmise that this is due to a particularly large fluctuation of one explicative 
factor, say a = 1, that we will call the dominant factor. This is not always true, 
and depends on the statistics of the fluctuations of the e,. A condition for this 
assumption to be true will be discussed below, and requires in particular that the tail 
of the dominant factor should not decrease faster than an exponential. Fortunately, 


this is a good assumption in financial markets. 


The aim is to compute the value-at-risk of a certain portfolio, i.e, the value 6/f* 
such that the probability that the variation of f exceeds df* is equal to a certain 
probability p: P.(6f*) = p. Our assumption about the existence of a dominant 
factor means that these events correspond to a market configuration where the 
fluctuation Se, is large, whereas all other factors are relatively small. Therefore, 
the large variations of the portfolio can be approximated as: 


M M 
1 

Sf (€.€2,.--,€m) = bf (er) + ps Aaa + 32s, Pa.beat, (5.58) 

where 6f(e,) is a shorthand notation for df (e,;,0,...,0). Now, we use the fact 


that: 


M 
P.capy= f Pleres outst éyjO [of (er, €2.....em) — 6f*] | | deo. (5.59) 


a=! 
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where O(x > 0) = | and @(x < 0) = 0. Expanding the © function to second 
order leads to: 


O(df (e;) ai éf* )+ b Ageu 5 rs Sn ve b(df (e;) — df* ) 


=? “~ a=2 
l M 
Zs y de AoAveaess' Fler) — 8f"). (5.60) 


where 4’ is the derivative of the 6-function with respect to 5f. In order to proceed 
with the integration over the variables e, in Eq. (5.59), one should furthermore note 
the following identity: 


1 
6(6f(e,) — 6f*) = Aro br) (5.61) 
\ 
where ej is such that df(e}) = 5f*, and Af is computed for e; = €}, Cart = 0. 


Inserting the above expansion of the © function into Eq. (5.59) and performing the 
integration over the e, then leads to: 


M ns 2 M 4*2 
RADE Yo Role ah pe = a a (Pei ) - a), 


2A% 
(5.62) 
where P(é;) is the probability distribution of the first factor, defined as: 
M 
Pee) = f Pleiers..-sem)] [deo (5.63) 


a=2 
In order to find the value-at-risk 5f*, one should thus solve Eq. (5.62) for ey with 
P.(6f*) = p, and then compute 5f (e7.0....,0). Note that the equation is not 
trivial since the Greeks must be estimated at the solution point é}. 

Let us discuss the general result, Eq. (5.62), in the simple case of a linear 
portfolio of assets, such that no convexity is present: the A,’s are constant and 
the F's are all zero. The equation then takes the following simpler form: 

Mi) 42,3 


A 
Pa (et) — DE Pej) =p. (5.64) 


tien 


Naively, one could have thought that in the dominant factor approximation, the 
value of ef would be the value-at-risk value of e, for the probability p, defined as: 


P(€1,var) = p. (5.65) 


However, the above equation shows that there is a correction term proportional to 
P‘(et). Since the latter quantity is negative, one sees that ej is actually larger than 
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€; var, and therefore 5f* > Sf (e).vur). This reflects the effect of all other factors, 
which tend to increase the value-at-risk of the portfolio. 

The result obtained above relies on a second-order expansion; when are higher- 
order corrections negligible? It is easy to see that higher-order terms involve 
higher-order derivatives of P(e,). A condition for these terms to be negligible in 
the limit p — 0, or ef — ox, is that the successive derivatives of P(e;) become 
smaller and smaller. This 1s true provided that P(e,) decays more slowly than 
exponentially, for example as a power-law. On the contrary, when P(e,) decays 
faster than exponentially (for example in the Gaussian case), then the expansion 
proposed above completely loses its meaning, since higher and higher corrections 
become dominant when p — 0. This is expected: in a Gaussian world, a large 
event results from the accidental superposition of many small events, whereas in a 
power-law world, large events are associated to one single large fluctuation which 
dominates over all the others. The case where P(é,) decays as an exponential is 
interesting, since it is often a good approximation for the tail of the fluctuations of 
financial assets. Taking P(e;) ~ a; exp —a@)é;, one finds that ef is the solution of: 


M py ee | 
; A 
enaie} [ = age | =p: (5.66) 
\ 


Since one has o? « a; *, the correction term is small provided that the variance of 
the portfolio generated by the dominant factor is much larger than the sum of the 
variance of all other factors. 

Coming back to Eq. (5.62), one expects that if the dominant factor is correctly 
identified, and if the distribution is such that the above expansion makes sense, an 
approximate solution is given by 7 = €1,van + €, With: 


P'(é,var) Ta 
~ Aa%a —_—— +} |, 5.67 
€ 4 oe ce ( + ) (5.67) 


5 2A? \ Pleven) = Ai 
where now all the Greeks at estimated at @) yar. 

In some cases, it appears that a ‘one-factor’ approximation is not enough to 
reproduce the correct VaR value. This can be traced back to the fact that there are 
actually other different dangerous market configurations which contribute to the 
VaR. The above formalism can however easily be adapted to the case where two 
(or more) dangerous configurations need to be considered. The general equations 
read: 


M 42 


A} Tid ppee 
a= Ps e+e EH oe D2 ae (Pent 22 P(e). (5.68) 


where a = 1,.... K are the K different dangerous factors. The e* and therefore 
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5 f*, are determined by the following K conditions: 


df *(et) = 6f"(e)) =--- = Sf" (ep) = Pap + Psa t+--> + Pik =p. (5.69) 


5.5 Risk diversification (*) 


We have put the emphasis on the fact that for real world options, the Black-Scholes 
divine surprise —1.e. the fact that the risk is zero—does not occur, and a non-zero 
residual risk remains. One can ask whether this residual risk can be reduced further 
by including other assets in the hedging portfolio. Buying stocks other than the 
underlying to hedge an option can be called an ‘exogenous’ hedge. A related 
question concerns the hedging of a ‘basket’ option, the pay-off of which being 
calculated on a linear superposition of different assets. A rather common example 
is that of ‘spread’ options, which depend on the difference of the price between 
two assets (for example the difference between the Nikkei and the S&P 500, or 
between the British and German interest rates, etc.). An interesting conclusion is 
that in the Gaussian case, an exogenous hedge increases the risk. An exogenous 
hedge is only useful in the presence of non-Gaussian effects. Another possibility is 
to hedge some options using different options; in other words, one can ask how to 
optimize a whole ‘book’ of options such that the global risk is minimum. 


‘Portfolio’ options and ‘exogenous’ hedging 


Let us suppose that one can buy M assets X',i = 1,...,M, the price of which being 4 
at time k. As in Chapter 3, we shall suppose that these assets can be decomposed over a 
basis of independent factors E*: 


M 
Ce eee (5.70) 
a=1 


The E° are independent, of unit variance, and of distribution function Pg. The correlation 
matrix of the fluctuations, (5x'5x/) is equal ta pie Oia Oja = [001 }j;. 

One considers a general option constructed on a linear combination of all assets, such 
that the pay-off depends on the value of 


i= ae -~ 671) 


and is equal to V(X) = max(X — xs, 0). The usual case of an option on the asset X! 
thus corresponds to f; = 8;,,. A spread option on the difference X' — X* corresponds to 
fi = 6) - 6;.2, etc. The hedging portfolio at time k is made of all the different assets X', 
with weight $,. The question is to determine the optimal composition of the portfolio, $3. 

Following the general method explained in Section 4.3.3, one finds that the part of the risk 
which depends on the strategy contains bath a linear and a quadratic term in the ’s. Using 
the fact that the E“ are independent random variables, one can compute the functional 
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derivative of the risk with respect to all the bx). Setting this functional derivative to 
zero leads to:> 


Fjoo's,¢ = [30] Aer Hen] 
j 4 j 
oy Oe Fed, (: wi i) - vee 
a Loy Fj O ja diz 7 


Using the cumulant expansion of P, (assumed to be even), one finds that: 


2 4 
a log B, (242%) =iz (= 101) = Eke (= 1/01) wees. SSS) 
The first term combines with 
Pe Sen (5.74) 
a jiVia 
to yield: 
iz Oia Oja fj = izlOO" - fla, (5.75) 
aj 
which finally leads to the fallawing simple result: 
oi" = fiPl{xj}. xs, N —) (5.76) 


where Pl{xi}, xs, N — k] is the probability for the aption ta be exercised, calculated at 
time k. In other words, in the Gaussian case (Kg = 0) the optimal portfolio is such that the 
proportion of asset i precisely reflects the weight of i in the basket on which the option is 
constructed. In particular, in the case of an option on a single asset, the hedging strategy 
is not improved if one includes other assets, even if these assets are correlated with the 
former. 

However, this canclusion is only carrect in the case of Gaussian fluctuations and does not 
hold if the kurtosis is. non-zero.® In this case, an extra term appears, given by: 


P 1 7 OP(X,xs.N —k) 
bb," = 5 [E1005 Oja Ene a (5.77) 
ja 


This correction is not, in general, proportional to fj, and therefore suggests that, in some 
cases, an exogenous hedge can be useful. However, ane should note that this correction is 
smail for at-the-money options (X = xs), since P(X, xs, N — k)/dxs = 0. 


5 In the following, i denotes the unit imaginary number, except when it appears as a subscript, in which case it 
is an asset label. 
6 The case of Lévy fluctuations is also such that an exogenous hedge is useless. 
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Option portfolio 
Since the risk associated with a single option is in general non-zero, the global 
risk of a portfolio of options (“book”) is also non-zero. Suppose that the book 
contains p; calls of ‘type’ i (i therefore contains the information of the strike x,; 
and maturity 7;). The first problem to solve is that of the hedging strategy. In the 
absence of volatility risk, it is not difficult to show that the optimal hedge for the 
book is the linear superposition of the optimal strategies for each individual option: 


$'(x,0) = > pdf. 0. (5.78) 


The residual risk is then given by: 
RY? = ¥ pipjCiy, (5.79) 
hj 


where the ‘correlation matrix’ C is equal to: 


Ciz = (max(x(T;) — xs;, 0) max(x(T;) — xj, 0)) 
n-1 
—CiCj — Dt SbF (x, kt) F(x, ket) (5.80) 
k=0 
where C; is the price of the option 7. If the constraint on the p;’s is of the form 
>; p; = 1, the optimum portfolio is given by: 


=a 
pr= Pim 
iy Cy 


(remember that by assumption the mean retum associated to an option is zero). 
Let us finally note that we have not considered, in the above calculation, the 
risk associated with volatility fluctuations, which is rather important in practice. 
It is a common practice to try to hedge this volatility risk using other types 
of options (for example, an exotic option can be hedged using a ‘plain vanilla’ 
option). A generalization of the Black-Scholes argument (assuming that option 


(5.81) 


prices themselves follow a Gaussian process, which is far from being the case) 


suggests that the optimal strategy is to hold a fraction al 


ie a 


da | da aie 


of options of type 2 to hedge the volatility risk associated with an option of 


type 1. Using the formalism established in Chapter 4, one could work out the 
correct hedging strategy, taking into account the non-Gaussian nature of the price 
variations of options. 
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Short glossary of financial terms 


Arbitrage A trading strategy that generates profit without risk, from a zero initial 
investment. 

Basis point Elementary price increment, equal to 10~* in relative value. 

Bid-ask spread Difference between the ask price (at which one can buy an asset) 
and the bid price (at which one can sell the same asset). 

Bond (Zero coupon): Financial contract which pays a fixed value at a given date 
in the future. 

Delta Derivative of the price of an option with respect to the current price of the 
underlying contract. This is equal to the optimal hedging strategy in the 
Black-Scholes world. 

Drawdown Period of time during which the price of an asset is below its last 
historical peak. 

Forward Financial contract under which the owner agrees to buy for a fixed price 
some asset, at a fixed date in the future. 

Futures Same as a forward contract, but on an organized market. In this case, 
the contract is marked-to-market, and the owner pays (or receives) the 
marginal price change on a daily basis. 

Gamma Second derivative of the price of an option with respect to the current 
price of the underlying contract. This is equal to the derivative of the 
optimal hedging strategy in the Black-Scholes world. 

Hedging strategy A trading strategy allowing one to reduce, or sometimes to 
eliminate completely, the risk of a position. 

Moneyness Describes the difference between the spot price and the strike price of 
an option. For a call, if this difference is positive [resp. negative], the option 
is said to be in-the-money [resp. out-of-the-money]. If the difference is 
zero, the option is at-the-money. 

Option Financial contract allowing the owner to buy [or sell] at a fixed maximum 
[minimum] price (the strike price) some underlying asset in the future. 
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This contract protects its owner against a possible rise or fall in price of 
the underlying asset. 

Over-the-counter This is said of a financial contract traded off market, say 
between two financial companies or banks. The price is then usually not 
publicly disclosed, at variance with organized markets. 

Spot price The current price of an asset for immediate delivery, in contrast with, 
for example, its forward price. 

Spot rate The value of the short-term interest rate. 

Spread Difference in price between two assets, or between two different prices of 
the same asset—for example, the bid—ask spread. 

Strike price Price at which an option can be exercised, see Option. 

Vega Derivative of the price of an option with respect to the volatility of the 
underlying contract. 

Value at Risk (VaR) Measure of the potential losses of a given portfolio, associ- 
ated to a certain confidence level. For example, a 95% VaR corresponds to 
the loss level that has a 5% probability to be exceeded. 

Volatility Standard deviation of an asset’s relative price changes. 
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